에이전트 & 도구
Transformers Agent는 실험 중인 API이므로 언제든지 변경될 수 있습니다. API나 기반 모델이 자주 업데이트되므로, 에이전트가 제공하는 결과물은 달라질 수 있습니다.
에이전트와 도구에 대해 더 알아보려면 소개 가이드를 꼭 읽어보세요. 이 페이지에는 기본 클래스에 대한 API 문서가 포함되어 있습니다.
에이전트
우리는 기본 Agent 클래스를 기반으로 두 가지 유형의 에이전트를 제공합니다:
- CodeAgent는 한 번에 동작합니다. 작업을 해결하기 위해 코드를 생성한 다음, 바로 실행합니다.
- ReactAgent는 단계별로 동작하며, 각 단계는 하나의 생각, 하나의 도구 호출 및 실행으로 구성됩니다. 이 에이전트에는 두 가지 클래스가 있습니다:
- ReactJsonAgent는 도구 호출을 JSON으로 작성합니다.
- ReactCodeAgent는 도구 호출을 Python 코드로 작성합니다.
Agent
class transformers.Agent
< source >( tools: typing.Union[typing.List[transformers.agents.tools.Tool], transformers.agents.agents.Toolbox] llm_engine: typing.Callable = None system_prompt: typing.Optional[str] = None tool_description_template: typing.Optional[str] = None additional_args: typing.Dict = {} max_iterations: int = 6 tool_parser: typing.Optional[typing.Callable] = None add_base_tools: bool = False verbose: int = 0 grammar: typing.Optional[typing.Dict[str, str]] = None managed_agents: typing.Optional[typing.List] = None step_callbacks: typing.Optional[typing.List[typing.Callable]] = None monitor_metrics: bool = True )
execute_tool_call
< source >( tool_name: str arguments: typing.Dict[str, str] )
Execute tool with the provided input and returns the result. This method replaces arguments with the actual values from the state if they refer to state variables.
extract_action
< source >( llm_output: str split_token: str )
Parse action from the LLM output
To be implemented in the child class
Reads past llm_outputs, actions, and observations or errors from the logs into a series of messages that can be used as input to the LLM.
CodeAgent
class transformers.CodeAgent
< source >( tools: typing.List[transformers.agents.tools.Tool] llm_engine: typing.Optional[typing.Callable] = None system_prompt: typing.Optional[str] = None tool_description_template: typing.Optional[str] = None grammar: typing.Optional[typing.Dict[str, str]] = None additional_authorized_imports: typing.Optional[typing.List[str]] = None **kwargs )
A class for an agent that solves the given task using a single block of code. It plans all its actions, then executes all in one shot.
Override this method if you want to change the way the code is
cleaned in the run
method.
run
< source >( task: str return_generated_code: bool = False **kwargs )
Runs the agent for the given task.
React agents
class transformers.ReactAgent
< source >( tools: typing.List[transformers.agents.tools.Tool] llm_engine: typing.Optional[typing.Callable] = None system_prompt: typing.Optional[str] = None tool_description_template: typing.Optional[str] = None grammar: typing.Optional[typing.Dict[str, str]] = None plan_type: typing.Optional[str] = None planning_interval: typing.Optional[int] = None **kwargs )
This agent that solves the given task step by step, using the ReAct framework: While the objective is not reached, the agent will perform a cycle of thinking and acting. The action will be parsed from the LLM output: it consists in calls to tools from the toolbox, with arguments chosen by the LLM engine.
Runs the agent in direct mode, returning outputs only at the end: should be launched only in the run
method.
planning_step
< source >( task is_first_step: bool = False iteration: int = None )
Used periodically by the agent to plan the next steps to reach the objective.
This method provides a final answer to the task, based on the logs of the agent’s interactions.
run
< source >( task: str stream: bool = False reset: bool = True **kwargs )
Runs the agent for the given task.
Runs the agent in streaming mode, yielding steps as they are executed: should be launched only in the run
method.
class transformers.ReactJsonAgent
< source >( tools: typing.List[transformers.agents.tools.Tool] llm_engine: typing.Optional[typing.Callable] = None system_prompt: typing.Optional[str] = None tool_description_template: typing.Optional[str] = None grammar: typing.Optional[typing.Dict[str, str]] = None planning_interval: typing.Optional[int] = None **kwargs )
This agent that solves the given task step by step, using the ReAct framework: While the objective is not reached, the agent will perform a cycle of thinking and acting. The tool calls will be formulated by the LLM in JSON format, then parsed and executed.
Perform one step in the ReAct framework: the agent thinks, acts, and observes the result. The errors are raised here, they are caught and logged in the run() method.
class transformers.ReactCodeAgent
< source >( tools: typing.List[transformers.agents.tools.Tool] llm_engine: typing.Optional[typing.Callable] = None system_prompt: typing.Optional[str] = None tool_description_template: typing.Optional[str] = None grammar: typing.Optional[typing.Dict[str, str]] = None additional_authorized_imports: typing.Optional[typing.List[str]] = None planning_interval: typing.Optional[int] = None **kwargs )
This agent that solves the given task step by step, using the ReAct framework: While the objective is not reached, the agent will perform a cycle of thinking and acting. The tool calls will be formulated by the LLM in code format, then parsed and executed.
Perform one step in the ReAct framework: the agent thinks, acts, and observes the result. The errors are raised here, they are caught and logged in the run() method.
Tools
load_tool
transformers.load_tool
< source >( task_or_repo_id model_repo_id = None token = None **kwargs )
Parameters
- task_or_repo_id (
str
) — The task for which to load the tool or a repo ID of a tool on the Hub. Tasks implemented in Transformers are:"document_question_answering"
"image_question_answering"
"speech_to_text"
"text_to_speech"
"translation"
- model_repo_id (
str
, optional) — Use this argument to use a different model than the default one for the tool you selected. - token (
str
, optional) — The token to identify you on hf.co. If unset, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - kwargs (additional keyword arguments, optional) —
Additional keyword arguments that will be split in two: all arguments relevant to the Hub (such as
cache_dir
,revision
,subfolder
) will be used when downloading the files for your tool, and the others will be passed along to its init.
Main function to quickly load a tool, be it on the Hub or in the Transformers library.
Loading a tool means that you’ll download the tool and execute it locally. ALWAYS inspect the tool you’re downloading before loading it within your runtime, as you would do when installing a package using pip/npm/apt.
Tool
A base class for the functions used by the agent. Subclass this and implement the __call__
method as well as the
following class attributes:
- description (
str
) — A short description of what your tool does, the inputs it expects and the output(s) it will return. For instance ‘This is a tool that downloads a file from aurl
. It takes theurl
as input, and returns the text contained in the file’. - name (
str
) — A performative name that will be used for your tool in the prompt to the agent. For instance"text-classifier"
or"image_generator"
. - inputs (
Dict[str, Dict[str, Union[str, type]]]
) — The dict of modalities expected for the inputs. It has onetype
key and adescription
key. This is used bylaunch_gradio_demo
or to make a nice space from your tool, and also can be used in the generated description for your tool. - output_type (
type
) — The type of the tool output. This is used bylaunch_gradio_demo
or to make a nice space from your tool, and also can be used in the generated description for your tool.
You can also override the method setup() if your tool as an expensive operation to perform before being usable (such as loading a model). setup() will be called the first time you use your tool, but not at instantiation.
Creates a Tool from a gradio tool.
from_hub
< source >( repo_id: str token: typing.Optional[str] = None **kwargs )
Parameters
- repo_id (
str
) — The name of the repo on the Hub where your tool is defined. - token (
str
, optional) — The token to identify you on hf.co. If unset, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - kwargs (additional keyword arguments, optional) —
Additional keyword arguments that will be split in two: all arguments relevant to the Hub (such as
cache_dir
,revision
,subfolder
) will be used when downloading the files for your tool, and the others will be passed along to its init.
Loads a tool defined on the Hub.
Loading a tool from the Hub means that you’ll download the tool and execute it locally. ALWAYS inspect the tool you’re downloading before loading it within your runtime, as you would do when installing a package using pip/npm/apt.
Creates a Tool from a langchain tool.
from_space
< source >( space_id: str name: str description: str api_name: typing.Optional[str] = None token: typing.Optional[str] = None ) → Tool
Parameters
- space_id (
str
) — The id of the Space on the Hub. - name (
str
) — The name of the tool. - description (
str
) — The description of the tool. - api_name (
str
, optional) — The specific api_name to use, if the space has several tabs. If not precised, will default to the first available api. - token (
str
, optional) — Add your token to access private spaces or increase your GPU quotas.
Returns
The Space, as a tool.
Creates a Tool from a Space given its id on the Hub.
push_to_hub
< source >( repo_id: str commit_message: str = 'Upload tool' private: typing.Optional[bool] = None token: typing.Union[bool, str, NoneType] = None create_pr: bool = False )
Parameters
- repo_id (
str
) — The name of the repository you want to push your tool to. It should contain your organization name when pushing to a given organization. - commit_message (
str
, optional, defaults to"Upload tool"
) — Message to commit while pushing. - private (
bool
, optional) — Whether to make the repo private. IfNone
(default), the repo will be public unless the organization’s default is private. This value is ignored if the repo already exists. - token (
bool
orstr
, optional) — The token to use as HTTP bearer authorization for remote files. If unset, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - create_pr (
bool
, optional, defaults toFalse
) — Whether or not to create a PR with the uploaded files or directly commit.
Upload the tool to the Hub.
For this method to work properly, your tool must have been defined in a separate module (not __main__
).
save
< source >( output_dir )
Saves the relevant code files for your tool so it can be pushed to the Hub. This will copy the code of your
tool in output_dir
as well as autogenerate:
- a config file named
tool_config.json
- an
app.py
file so that your tool can be converted to a space - a
requirements.txt
containing the names of the module used by your tool (as detected when inspecting its code)
You should only use this method to save tools that are defined in a separate module (not __main__
).
Overwrite this method here for any operation that is expensive and needs to be executed before you start using your tool. Such as loading a big model.
Toolbox
class transformers.Toolbox
< source >( tools: typing.List[transformers.agents.tools.Tool] add_base_tools: bool = False )
The toolbox contains all tools that the agent can perform operations with, as well as a few methods to manage them.
Adds a tool to the toolbox
Clears the toolbox
remove_tool
< source >( tool_name: str )
Removes a tool from the toolbox
show_tool_descriptions
< source >( tool_description_template: str = None )
Returns the description of all tools in the toolbox
Updates a tool in the toolbox according to its name.
PipelineTool
class transformers.PipelineTool
< source >( model = None pre_processor = None post_processor = None device = None device_map = None model_kwargs = None token = None **hub_kwargs )
Parameters
- model (
str
or PreTrainedModel, optional) — The name of the checkpoint to use for the model, or the instantiated model. If unset, will default to the value of the class attributedefault_checkpoint
. - pre_processor (
str
orAny
, optional) — The name of the checkpoint to use for the pre-processor, or the instantiated pre-processor (can be a tokenizer, an image processor, a feature extractor or a processor). Will default to the value ofmodel
if unset. - post_processor (
str
orAny
, optional) — The name of the checkpoint to use for the post-processor, or the instantiated pre-processor (can be a tokenizer, an image processor, a feature extractor or a processor). Will default to thepre_processor
if unset. - device (
int
,str
ortorch.device
, optional) — The device on which to execute the model. Will default to any accelerator available (GPU, MPS etc…), the CPU otherwise. - device_map (
str
ordict
, optional) — If passed along, will be used to instantiate the model. - model_kwargs (
dict
, optional) — Any keyword argument to send to the model instantiation. - token (
str
, optional) — The token to use as HTTP bearer authorization for remote files. If unset, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - hub_kwargs (additional keyword arguments, optional) — Any additional keyword argument to send to the methods that will load the data from the Hub.
A Tool tailored towards Transformer models. On top of the class attributes of the base class Tool, you will need to specify:
- model_class (
type
) — The class to use to load the model in this tool. - default_checkpoint (
str
) — The default checkpoint that should be used when the user doesn’t specify one. - pre_processor_class (
type
, optional, defaults to AutoProcessor) — The class to use to load the pre-processor - post_processor_class (
type
, optional, defaults to AutoProcessor) — The class to use to load the post-processor (when different from the pre-processor).
Uses the post_processor
to decode the model output.
Uses the pre_processor
to prepare the inputs for the model
.
Sends the inputs through the model
.
Instantiates the pre_processor
, model
and post_processor
if necessary.
launch_gradio_demo
transformers.launch_gradio_demo
< source >( tool_class: Tool )
Launches a gradio demo for a tool. The corresponding tool class needs to properly implement the class attributes
inputs
and output_type
.
ToolCollection
class transformers.ToolCollection
< source >( collection_slug: str token: typing.Optional[str] = None )
Tool collections enable loading all Spaces from a collection in order to be added to the agent’s toolbox.
[!NOTE] Only Spaces will be fetched, so you can feel free to add models and datasets to your collection if you’d like for this collection to showcase them.
Example:
>>> from transformers import ToolCollection, ReactCodeAgent
>>> image_tool_collection = ToolCollection(collection_slug="huggingface-tools/diffusion-tools-6630bb19a942c2306a2cdb6f")
>>> agent = ReactCodeAgent(tools=[*image_tool_collection.tools], add_base_tools=True)
>>> agent.run("Please draw me a picture of rivers and lakes.")
엔진
에이전트 프레임워크에서 사용할 수 있는 엔진을 자유롭게 만들고 사용할 수 있습니다. 이 엔진들은 다음과 같은 사양을 가지고 있습니다:
- 입력(
List[Dict[str, str]]
)에 대한 메시지 형식을 따르고 문자열을 반환해야 합니다. - 인수
stop_sequences
에 시퀀스가 전달되기 전에 출력을 생성하는 것을 중지해야 합니다.
HfApiEngine
편의를 위해, 위의 사항을 구현하고 대규모 언어 모델 실행을 위해 추론 엔드포인트를 사용하는 HfApiEngine
을 추가했습니다.
>>> from transformers import HfApiEngine
>>> messages = [
... {"role": "user", "content": "Hello, how are you?"},
... {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
... {"role": "user", "content": "No need to help, take it easy."},
... ]
>>> HfApiEngine()(messages, stop_sequences=["conversation"])
"That's very kind of you to say! It's always nice to have a relaxed "
class transformers.HfApiEngine
< source >( model: str = 'meta-llama/Meta-Llama-3.1-8B-Instruct' token: typing.Optional[str] = None max_tokens: typing.Optional[int] = 1500 timeout: typing.Optional[int] = 120 )
Parameters
- model (
str
, optional, defaults to"meta-llama/Meta-Llama-3.1-8B-Instruct"
) — The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. - token (
str
, optional) — Token used by the Hugging Face API for authentication. If not provided, the class will use the token stored in the Hugging Face CLI configuration. - max_tokens (
int
, optional, defaults to 1500) — The maximum number of tokens allowed in the output. - timeout (
int
, optional, defaults to 120) — Timeout for the API request, in seconds.
Raises
ValueError
ValueError
— If the model name is not provided.
A class to interact with Hugging Face’s Inference API for language model interaction.
This engine allows you to communicate with Hugging Face’s models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization.
에이전트 유형
에이전트는 도구 간의 모든 유형의 객체를 처리할 수 있습니다; 도구는 완전히 멀티모달이므로 텍스트, 이미지, 오디오, 비디오 등 다양한 유형을 수락하고 반환할 수 있습니다. 도구 간의 호환성을 높이고 ipython (jupyter, colab, ipython 노트북, …)에서 이러한 반환 값을 올바르게 렌더링하기 위해 이러한 유형을 중심으로 래퍼 클래스를 구현합니다.
래핑된 객체는 처음과 동일하게 작동해야 합니다; 텍스트 객체는 여전히 문자열로 작동해야 하며,
이미지 객체는 여전히 PIL.Image
로 작동해야 합니다.
이러한 유형에는 세 가지 특정 목적이 있습니다:
to_raw
를 호출하면 기본 객체가 반환되어야 합니다.to_string
을 호출하면 객체가 문자열로 반환되어야 합니다:AgentText
의 경우 문자열이 될 수 있지만, 다른 경우에는 객체의 직렬화된 버전의 경로일 수 있습니다.- ipython 커널에서 표시할 때 객체가 올바르게 표시되어야 합니다.
AgentText
Text type returned by the agent. Behaves as a string.
AgentImage
Image type returned by the agent. Behaves as a PIL.Image.
save
< source >( output_bytes format **params )
Saves the image to a file.
Returns the “raw” version of that object. In the case of an AgentImage, it is a PIL.Image.
Returns the stringified version of that object. In the case of an AgentImage, it is a path to the serialized version of the image.
AgentAudio
Audio type returned by the agent.
Returns the “raw” version of that object. It is a torch.Tensor
object.
Returns the stringified version of that object. In the case of an AgentAudio, it is a path to the serialized version of the audio.