OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed [2025-02-16 07:18:50,001] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,001] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,001] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,001] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,001] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,001] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,001] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,001] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,007] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,007] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,007] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,007] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,007] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,007] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,007] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,007] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,087] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,087] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,087] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,087] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,087] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,087] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,087] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,087] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,260] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,260] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,260] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,260] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,260] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,260] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,260] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,260] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,268] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,268] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,268] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,268] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,269] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,269] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,269] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,269] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,344] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,344] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,344] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,344] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,344] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,344] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,344] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,344] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,441] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,441] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,441] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,441] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,441] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,441] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,442] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,442] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,443] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,443] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,443] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,443] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,443] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,443] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,443] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,443] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,478] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,478] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,478] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,478] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,478] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,478] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,478] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,478] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,490] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,490] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,490] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,490] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,490] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,490] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,490] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,490] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,518] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,518] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,518] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,518] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,518] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,518] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,518] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,518] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,579] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,579] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,579] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,579] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,579] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,579] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,579] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,579] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,652] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,652] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,652] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,652] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,652] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,652] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,652] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,653] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,826] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,826] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,826] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,826] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,826] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,826] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,826] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:50,826] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,048] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,048] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,049] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,049] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,049] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,049] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,049] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,049] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,247] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,247] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,247] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,247] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,247] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,247] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,247] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:51,247] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 07:18:55,092] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,092] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,092] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,092] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,092] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,092] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,093] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,093] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,134] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,134] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,134] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,134] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,134] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,134] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,134] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,135] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,318] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,318] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,318] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,318] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,318] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,318] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,318] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,318] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,506] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,506] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,506] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,506] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,506] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,506] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,506] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,506] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,613] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,613] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,613] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,613] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,613] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,613] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,613] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,613] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,616] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,616] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,616] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,616] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,616] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,616] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,616] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,616] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,650] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,650] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,650] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,650] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,650] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,650] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,651] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,651] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,655] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,655] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,655] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,655] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,655] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,655] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,655] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,655] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,796] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,796] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,796] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,796] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,796] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,796] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,796] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,796] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,809] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,809] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,810] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,810] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,810] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,810] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,810] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,810] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,813] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,813] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,814] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,814] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,814] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,814] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,814] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,814] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,816] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,816] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,816] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,816] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,816] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,816] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,816] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,816] [INFO] [comm.py:652:init_distributed] cdb=None ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-83, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-81, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) [2025-02-16 07:18:55,999] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,999] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,999] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,999] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,999] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,999] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,999] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:55,999] [INFO] [comm.py:652:init_distributed] cdb=None ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-34, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) [2025-02-16 07:18:56,203] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,203] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,203] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,203] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,203] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,203] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,203] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,203] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,377] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,377] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,377] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,377] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,377] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,377] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,377] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,377] [INFO] [comm.py:652:init_distributed] cdb=None ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-31, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-83, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-83, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-83, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-83, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-83, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-83, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-81, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-82, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-83, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-84, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) [2025-02-16 07:18:56,546] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,546] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,546] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,546] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,546] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,546] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-02-16 07:18:56,546] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,546] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 07:18:56,546] [INFO] [comm.py:652:init_distributed] cdb=None ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-81, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-81, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-81, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-81, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-81, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-81, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-36, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-32, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-33, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-38, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-34, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-34, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-34, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-34, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-34, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-34, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-34, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-30, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-35, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-31, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-31, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-31, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-31, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-31, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-31, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-31, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-82, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-37, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-82, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-84, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspeval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-84, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-82, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-84, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-84, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-84, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-80, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-82, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-36, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-32, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-84, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-84, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-36, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, ect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_teval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, ype='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-32, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-82, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-82, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-82, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-36, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-36, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-32, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-36, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-36, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-36, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-32, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspDataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-32, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, torch_empty_cache_steps=None, torchdynamo=None, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-29, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_asppush_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-32, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-32, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-80, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-30, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-30, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-80, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-33, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-33, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-30, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-28, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-30, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-80, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-80, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-30, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-80, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_tect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, ype='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-30, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-80, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-30, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-80, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-38, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-38, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-38, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-33, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-33, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-33, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_asppush_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspeval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-38, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-33, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-33, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-38, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-38, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-38, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-35, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-35, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-35, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-35, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-35, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-35, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-55_h100-st-p548xlarge-35, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-37, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-37, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-37, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-37, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-37, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-29, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-29, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-37, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-37, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-29, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-29, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-29, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-28, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-28, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-29, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-28, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-29, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-28, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4')TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-28, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-28, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_07-18-56_h100-st-p548xlarge-28, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plus Pretrained: NonePretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plusPretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusPretrained: None Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: NonePretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusPretrained: None Loading EVA ViT: eva-clip-E-14-plusPretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Pretrained: NonePretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusPretrained: NonePretrained: NonePretrained: None Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusPretrained: NonePretrained: NonePretrained: NonePretrained: NonePretrained: NoneLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusPretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusPretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusPretrained: None Pretrained: NonePretrained: None Pretrained: NoneLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plusPretrained: NonePretrained: NonePretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusPretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: NonePretrained: None Pretrained: None Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Pretrained: None Pretrained: NonePretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusPretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusPretrained: None Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusPretrained: NoneLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: NonePretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusPretrained: None Pretrained: NonePretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: NonePretrained: None Pretrained: None Pretrained: NonePretrained: None Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusPretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None