FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces
Abstract
Virtual film production requires intricate decision-making processes, including scriptwriting, virtual cinematography, and precise actor positioning and actions. Motivated by recent advances in automated decision-making with language agent-based societies, this paper introduces FilmAgent, a novel LLM-based multi-agent collaborative framework for end-to-end film automation in our constructed 3D virtual spaces. FilmAgent simulates various crew roles, including directors, screenwriters, actors, and cinematographers, and covers key stages of a film production workflow: (1) idea development transforms brainstormed ideas into structured story outlines; (2) scriptwriting elaborates on dialogue and character actions for each scene; (3) cinematography determines the camera setups for each shot. A team of agents collaborates through iterative feedback and revisions, thereby verifying intermediate scripts and reducing hallucinations. We evaluate the generated videos on 15 ideas and 4 key aspects. Human evaluation shows that FilmAgent outperforms all baselines across all aspects and scores 3.98 out of 5 on average, showing the feasibility of multi-agent collaboration in filmmaking. Further analysis reveals that FilmAgent, despite using the less advanced GPT-4o model, surpasses the single-agent o1, showing the advantage of a well-coordinated multi-agent system. Lastly, we discuss the complementary strengths and weaknesses of OpenAI's text-to-video model Sora and our FilmAgent in filmmaking.
Community
🎬 Meet FilmAgent – A multi-agent framework for automating film production end-to-end in 3D virtual spaces 🌐
Highlights:
🔹 Multi-agent collaboration ensures script quality💡
🔹 Physics-compliant, story-rich video outputs 🎥
🔹 Tryouts on o1 & Sora, showcasing "Everyone gathers firewood and the flames rise" (众人拾柴火焰高) 🔥
Paper: https://arxiv.org/abs/2501.12909
Github: https://github.com/HITsz-TMG/FilmAgent
Website: https://filmagent.github.io/
Video: https://www.youtube.com/watch?v=hTI-0777iHU
Feel free to make your own film!
⭐ Star us on GitHub to stay updated! New versions in the making...
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration (2024)
- VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation (2024)
- DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation (2024)
- PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents (2025)
- SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters (2024)
- Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop (2024)
- DirectorLLM for Human-Centric Video Generation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper