Trust remote code for SageMaker execution

#85

I'm trying to deploy this model into AWS SageMaker. Per this link (https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-inference-containers) using this model image:
763104351884.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04

Cloudwatch shows this error upon launch of the model, before even invoking the endpoint:

W-9000-tiiuae__falcon-7b-instruc-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Loading /.sagemaker/mms/models/tiiuae__falcon-7b-instruct.eb410fb6ffa9028e97adb801f0d6ec46d02f8b07 requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

Similarly when I try and invoke the endpoint via python/boto it confirms the issue:

Traceback (most recent call last):
  File "/../sagemaker.py", line 44, in <module>
    main()
  File "/../sagemaker.py", line 39, in main
    response = predict_data(sagemaker_runtime, endpoint_name, request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/../sagemaker.py", line 7, in predict_data
    response = sagemaker_runtime.invoke_endpoint(EndpointName=endpoint_name,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/botocore/client.py", line 980, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "Loading /.sagemaker/mms/models/tiiuae__falcon-7b-instruct.eb410fb6ffa9028e97adb801f0d6ec46d02f8b07 requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code\u003dTrue` to remove this error."
}

Here's some of the Terraform I'm using to configure the AWS resources for this:

# module.huggingface_sagemaker_falcon.data.aws_sagemaker_prebuilt_ecr_image.deploy_image:
data "aws_sagemaker_prebuilt_ecr_image" "deploy_image" {
  image_tag       = "2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04"
  repository_name = "huggingface-pytorch-inference"
}

# module.huggingface_sagemaker_falcon.aws_sagemaker_model.model_with_hub_model[0]:
resource "aws_sagemaker_model" "model_with_hub_model" {
  enable_network_isolation = false
  execution_role_arn       = aws_iam_role.new_role.arn
  name                     = "falcon-model"

  primary_container {
    environment = {
      "HF_MODEL_ID"          = "tiiuae/falcon-7b-instruct"
      "HF_TASK"              = "text-generation"
      "HF_TRUST_REMOTE_CODE" = "True"
      "HF_MODEL_REVISION"    = "eb410fb6ffa9028e97adb801f0d6ec46d02f8b07"
    }
    image = data.aws_sagemaker_prebuilt_ecr_image.deploy_image.registry_path
    mode  = "SingleModel"
  }
}

Here's my Python:

import boto3
import json


def predict_data(sagemaker_runtime, endpoint_name, input_data):
    json_input_data = json.dumps(input_data).encode('utf-8')
    response = sagemaker_runtime.invoke_endpoint(EndpointName=endpoint_name,
                                                 ContentType='application/json',
                                                 Body=json_input_data)
    return response


def main():
    region_name = 'us-east-2'
    session = boto3.Session(region_name=region_name)
    sagemaker_runtime = session.client('sagemaker-runtime')

    endpoint_name = '<endpoint name>'

    # define prompt
    prompt = """You are the most advanced AI assistant on the planet, called Falcon.

    User: How can we set up Kubernetes cluster on AWS? Think step by step.
    Falcon:"""

    # hyperparameters for llm
    request = {
        "inputs": prompt
    }

    response = predict_data(sagemaker_runtime, endpoint_name, request)
    print(response)


if __name__ == "__main__":
    main()

Seems that the config.json needs this added? From the AWS side, I couldn't find a way to configure the model, endpoint configuration, or endpoint resources to override that.

Also making the assumption that true is treated like a bool in that configuration and the capitalization true vs True does not matter.

I'm very much an ML novice so if this is a security concern or if there is in fact a way to configure our AWS Sagemaker resources, or my python request, to trust remote code, or if I am completely in the wrong stratosphere as to how this all works, please feel free to reject this and let me know.

I'm closing this. I figured out how to test, and this does not actually accomplish what I was looking to do. For those curious, the key was to run the model on a text-generation specific pre-baked image: 763104351884.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.0.3-gpu-py39-cu118-ubuntu20.04.

matthewmrichter changed pull request status to closed

Sign up or log in to comment