NYU VisionX

university

https://www.sainingxie.com/

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

xcpan updated a dataset about 20 hours ago

nyu-visionx/oro_optical_results

xcpan published a dataset about 20 hours ago

nyu-visionx/oro_optical_results

xcpan updated a dataset 5 days ago

nyu-visionx/oro_dino_results

View all activity

nyu-visionx's activity

xcpan

updated a dataset about 20 hours ago

nyu-visionx/oro_optical_results

Viewer • Updated about 20 hours ago • 892k • 6

xcpan

published a dataset about 20 hours ago

nyu-visionx/oro_optical_results

Viewer • Updated about 20 hours ago • 892k • 6

sayakpaul

posted an update 4 days ago

Post

1661

We have been cooking a couple of fine-tuning runs on CogVideoX with finetrainers, smol datasets, and LoRA to generate cool video effects like crushing, dissolving, etc.

We are also releasing a LoRA extraction utility from a fully fine-tuned checkpoint. I know that kind of stuff has existed since eternity, but the quality on video models was nothing short of spectacular. Below are some links:

* Models and datasets: https://huggingface.co/finetrainers
* finetrainers: https://github.com/a-r-r-o-w/finetrainers
* LoRA extraction: https://github.com/huggingface/diffusers/blob/main/scripts/extract_lora_from_model.py

1 reply

xcpan

updated a dataset 5 days ago

nyu-visionx/oro_dino_results

Updated 5 days ago • 14

xcpan

published a dataset 5 days ago

nyu-visionx/oro_dino_results

Updated 5 days ago • 14

sainx

authored a paper 5 days ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published 6 days ago • 88

jihanyang

authored a paper 5 days ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published 6 days ago • 88

sayakpaul

posted an update 7 days ago

Post

1856

We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more 🔥

5-6GBs for HunyuanVideo, sky is the limit 🌌 🤗
https://huggingface.co/blog/video_gen

craigwu

authored a paper 10 days ago

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published 11 days ago • 22

sainx

authored a paper 17 days ago

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published 18 days ago • 67

jihanyang

updated a dataset 20 days ago

nyu-visionx/VSI-Bench

Viewer • Updated 20 days ago • 5.13k • 1.88k • 30

sayakpaul

posted an update about 1 month ago

Post

4323

Commits speak louder than words 🤪

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release 🤗
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0

anjaliwgupta

authored a paper about 1 month ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

jihanyang

authored a paper about 2 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

rilynhan

authored a paper about 2 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

sainx

authored a paper about 2 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

ShushengYang

authored 4 papers about 2 months ago

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

Paper • 2204.02964 • Published Apr 6, 2022

AI & ML interests

Recent Activity

Team members 15

nyu-visionx's activity