# Inference Callable This document provides guidelines for creating an inference callable for PyTriton, which serves as the entry point for handling inference requests. The inference callable is an entry point for handling inference requests. The interface of the inference callable assumes it receives a list of requests with input dictionaries, where each dictionary represents one request mapping model input names to NumPy ndarrays. Requests contain also custom HTTP/gRPC headers and parameters in parameters dictionary. ## Function The simples inference callable is a function that implement the interface to handle request and responses. Request class contains following fields: - data - for inputs (stored as dictionary, but can be also accessed with request dict interface e.g. request["input_name"]) - parameters - for combined parameters and HTTP/gRPC headers For more information about parameters and headers see [here](custom_params.md). ```python import numpy as np from typing import Dict, List from pytriton.proxy.types import Request def infer_fn(requests: List[Request]) -> List[Dict[str, np.ndarray]]: ... ``` ## Class In many cases is worth to use an object of given class as callable. This is especially useful when you want to have a control over the order of initialized objects or models. ```python import numpy as np from typing import Dict, List from pytriton.proxy.types import Request class InferCallable: def __call__(self, requests: List[Request]) -> List[Dict[str, np.ndarray]]: ... ``` ## Binding to Triton To use the inference callable with PyTriton, it must be bound to a Triton server instance using the `bind` method: ```python import numpy as np from pytriton.triton import Triton from pytriton.model_config import ModelConfig, Tensor with Triton() as triton: triton.bind( model_name="MyInferenceFn", infer_func=infer_fn, inputs=[Tensor(shape=(1,), dtype=np.float32)], outputs=[Tensor(shape=(1,), dtype=np.float32)], config=ModelConfig(max_batch_size=8) ) infer_callable = InferCallable() triton.bind( model_name="MyInferenceCallable", infer_func=infer_callable, inputs=[Tensor(shape=(1,), dtype=np.float32)], outputs=[Tensor(shape=(1,), dtype=np.float32)], config=ModelConfig(max_batch_size=8) ) ``` For more information on serving the inference callable, refer to the [Loading models section](binding_models.md) on Deploying Models page.