# CodeAct Agent Framework This folder is an implementation of OpenHands's main agent, the CodeAct Agent. It is based on ([CodeAct](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)), an idea of consolidating LLM agents' **act**ions into a unified **code** action space for both *simplicity* and *performance*. ## Overview The CodeAct agent operates through a function calling interface. At each turn, the agent can: 1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc. 2. **CodeAct**: Execute actions through a set of well-defined tools: - Execute Linux `bash` commands with `execute_bash` - Run Python code in an [IPython](https://ipython.org/) environment with `execute_ipython_cell` - Interact with web browsers using `browser` and `web_read` - Edit files using `str_replace_editor` or `edit_file` ![image](https://github.com/All-Hands-AI/OpenHands/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3) ## Built-in Tools The agent provides several built-in tools: ### 1. `execute_bash` - Execute any valid Linux bash command - Handles long-running commands by running them in background with output redirection - Supports interactive processes with STDIN input and process interruption - Handles command timeouts with automatic retry in background mode ### 2. `execute_ipython_cell` - Run Python code in an IPython environment - Supports magic commands like `%pip` - Variables are scoped to the IPython environment - Requires defining variables and importing packages before use ### 3. `web_read` and `browser` - `web_read`: Read and convert webpage content to markdown - `browser`: Interact with webpages through Python code - Supports common browser actions like navigation, clicking, form filling, scrolling - Handles file uploads and drag-and-drop operations ### 4. `str_replace_editor` - View, create and edit files through string replacement - Persistent state across command calls - File viewing with line numbers - String replacement with exact matching - Undo functionality for edits ### 5. `edit_file` (LLM-based) - Edit files using LLM-based content generation - Support for partial file edits with line ranges - Handles large files by editing specific sections - Append mode for adding content to files ## Configuration Tools can be enabled/disabled through configuration parameters: - `codeact_enable_browsing`: Enable browser interaction tools - `codeact_enable_jupyter`: Enable IPython code execution - `codeact_enable_llm_editor`: Enable LLM-based file editing (falls back to string replacement editor if disabled) ## Micro-agents The agent includes specialized micro-agents for specific tasks: 1. **npm**: Handles npm package installation with non-interactive shell workarounds 2. **github**: Manages GitHub operations with API token support and PR creation guidelines 3. **flarglebargle**: Easter egg response handler ## Adding New Tools The CodeAct agent uses a function calling interface based on `litellm`'s `ChatCompletionToolParam`. To add a new tool: 1. Define the tool in `function_calling.py`: ```python MyTool = ChatCompletionToolParam( type='function', function=ChatCompletionToolParamFunctionChunk( name='my_tool', description='Description of what the tool does and how to use it', parameters={ 'type': 'object', 'properties': { 'param1': { 'type': 'string', 'description': 'Description of parameter 1', }, 'param2': { 'type': 'integer', 'description': 'Description of parameter 2', }, }, 'required': ['param1'], # List required parameters here }, ), ) ``` 2. Add the tool to `get_tools()` in `function_calling.py` 3. Implement the corresponding action handler in the agent class ## Implementation Details The agent is implemented in two main files: 1. `codeact_agent.py`: Core agent implementation with: - Message history management - Tool execution handling - State management - Action/observation processing 2. `function_calling.py`: Tool definitions and function calling interface with: - Tool parameter specifications - Tool descriptions and examples - Function calling response parsing