Fix long queue waits with mechanism to prevent running duplicate jobs4e8cc0a0

13 days ago

This PR introduces a mechanism to prevent running duplicate model merge jobs. It tracks active jobs by hashing the YAML configuration and checks if a new job matches an existing one. If a duplicate is detected, the user is prompted to either continue with the new job (canceling the old one) or abort the operation. A duplicate may be caused by a user losing connection to the Space, and the job may be stuck in queue. The next job they place may not finish because the previous job is still stuck. They may have to wait up to 6 hours (but usually ~2-4) for the Space to restart.

Key Changes:

Active Job Tracking: Jobs are tracked by their YAML configuration hash.
Duplicate Detection: Before starting a new job, the system checks for duplicates.
User Prompt: If a duplicate is detected, the user can choose to cancel the old job and continue with the new one.

Why This Change is Needed:

Prevents multiple identical merge jobs from running simultaneously, saving resources and avoiding long queues and delays.

Update app.py6b69fab7

Austinkeith2010

9 days ago

I second this

Austinkeith2010

9 days ago

checked over it and apparantly it doesnt actually stop the task, it just pretends to lmao

Blazgo

8 days ago

@Austinkeith2010 I'm not sure what you mean... Doesn't it work?

Austinkeith2010

8 days ago

This part assumes you have the ability to cancel the previous job if needed

        # In real implementation, you'd stop the old task/process here

Blazgo

7 days ago

Sorry, as you may have guessed that's ChatGPT-generated lol. I don't really know how to cancel a job, but if you do have some insights let me know.