<!--
Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# PyTriton remote mode

Remote mode is a way to use the PyTriton with the Triton Inference Server running remotely (at this moment
it must be deployed on the same machine, but may be launched in a different container).

To bind the model in remote mode, it is required to use the `RemoteTriton` class instead of `Triton`.
Only difference of using `RemoteTriton` is that it requires the triton `url` argument in the constructor.

## Example of binding a model in remote mode

Example below assumes that the Triton Inference Server is running on the same machine (launched with PyTriton
in separate python script).

`RemoteTriton` binds remote model to existing Triton Inference Server.
When `RemoteTriton` is closed, the model is unloaded from the server.

<!--pytest.mark.skip-->
```python
import numpy as np

from pytriton.decorators import batch
from pytriton.model_config import ModelConfig, Tensor
from pytriton.triton import RemoteTriton, TritonConfig

triton_config = TritonConfig(
    cache_config=[f"local,size={1024 * 1024}"],  # 1MB
)

@batch
def _add_sub(**inputs):
    a_batch, b_batch = inputs.values()
    add_batch = a_batch + b_batch
    sub_batch = a_batch - b_batch
    return {"add": add_batch, "sub": sub_batch}

with RemoteTriton(url='localhost') as triton:
    triton.bind(
        model_name="AddSub",
        infer_func=_add_sub,
        inputs=[Tensor(shape=(1,), dtype=np.float32), Tensor(shape=(1,), dtype=np.float32)],
        outputs=[Tensor(shape=(1,), dtype=np.float32), Tensor(shape=(1,), dtype=np.float32)],
        config=ModelConfig(max_batch_size=8, response_cache=True)
    )
    triton.serve()
```