Apply for community grant: Academic project (gpu)

#1
by Gen-Sim - opened

This is a project from a research group (under submission). We would love to have the GPU grant to generate video predictions so that the general public can play with our system. Thanks!

The description of our project is as follows.

Building interactive robotic video world models and policies is extremely challenging due to the requirements in data and physical fidelity. We propose Heterogeneous Masked Autoregression (HMA) for video dynamics modeling, to unify these two problems by pre-training from observations and action sequences in large-scale heterogeneous data across different robotic embodiments, domains, and tasks. We leverage masked autoregression and diffusion models for fast, high-quality, and controllable video predictions. Our interactive video generation achieves better visual fidelity and controllability than the previous state-of-the-art with 15x faster speed in the real world. After post-training, this model can be used as a video simulator from low-level action inputs for evaluating policies and generating synthetic data. We also explore (HMA) as autoregressive robotic policies.

The code will be uploaded momentarily and the GPU requirement is not significant.

Hi @Gen-Sim , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

BTW, I can't find your grant request in our internal Slack channel, so I believe you didn't use the button in the Space settings. If you don't use it, there's no way for us to notice your request. docs

Thanks so much for the help! Is it possible to get a regular GPU than zeroGPU. We found that the speed of HF demo is 10x slower than local deployment with a 4090. Thanks!

Hmm, that's weird. ZeroGPU Spaces listed here are running pretty fast and I don't think there's any significant slowdown in them. So, the speed issue is probably due to the implementation of this Space. Maybe it's running on CPU instead of using CUDA?
BTW, it seems that this Space is currently throwing an error when you try to run it.

Thanks for getting back to me so quickly. I think the bug should be fixed now.

The running time is still significantly slower than here https://205a34df53c9856cbd.gradio.live/. I am not sure which part that I missed. It should be using the GPU now.

Thanks for checking. But I'm still seeing this error in the log:

User clicked: right
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 256, in thread_wrapper
    res = future.result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/app/app.py", line 72, in handle_input
    new_image = model(direction)  # Get a new image from the model
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 179, in gradio_handler
    return task(*args, **kwargs)
  File "/home/user/app/app.py", line 64, in model
    next_image = genie.step(action)['pred_next_frame']
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/app/sim/simulator.py", line 238, in step
    assert self.cached_latent_frames is not None and self.cached_actions is not None, \
AssertionError: Model is not prompted yet. Please call `set_initial_state` first.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/home/user/app/app.py", line 106, in <lambda>
    right.click(fn=lambda: handle_input("right"), outputs=image_display, show_progress='hidden')
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 214, in gradio_handler
    raise res.value
AssertionError: Model is not prompted yet. Please call `set_initial_state` first.

Very strange. The code behaves differently from local.

I added an initialization in the beginning. The error should be gone now, but the simulation still flickers around.
It looks like initializing the simulator inside a callback function https://huggingface.co/spaces/liruiw/hma/blob/main/app.py#L66 does not work.

Nice! Looks like the Space is running fast too.

but the simulation still flickers around.
It looks like initializing the simulator inside a callback function https://huggingface.co/spaces/liruiw/hma/blob/main/app.py#L66 does not work.

Oh.

Ah, I think that's because you are updating the states of the global variable genie. On Spaces, multiple users might use your app simultaneously, so if one user updates a global variable, it would affect the other users.

Maybe you can use gr.State. https://www.gradio.app/docs/gradio/state

On Spaces, multiple users might use your app simultaneously, so if one user updates a global variable, it would affect the other users.

Indeed this is the key problem. I think this is still not real-time compared to local. But let me try using the gr.State first.

could you help taking a look at the GPU errors? I am not fully sure if I am utilizing the spaces GPU correctly now. It also fails whenever I called reset()

I tried GR state and it does not alter this behavior.

Hmm, I'm not sure what the error is, but on ZeroGPU Spaces, CUDA is only available inside functions decorated with @spaces.GPU, so maybe you are using CUDA somewhere outside of them?

Anyway, as for the use of gr.State, I was thinking making GenieSimulator stateless and adding external variables to keep its states, like

with gr.Blocks() as demo:
    init_prompt = gr.State()
    ...

Thanks for the reply! I still haven't fixed this issue and I confirmed that an earlier version of the code also doesn't work (so something in the HF interface changed).

Sign up or log in to comment