matthewmrichter commited on
Commit
777a465
1 Parent(s): eb410fb

Trust remote code for SageMaker execution

Browse files

I'm trying to deploy this model into AWS SageMaker. Per this link (https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-inference-containers) using this model image:
763104351884.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04

Cloudwatch shows this error upon launch of the model, before even invoking the endpoint:

```
W-9000-tiiuae__falcon-7b-instruc-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Loading /.sagemaker/mms/models/tiiuae__falcon-7b-instruct.eb410fb6ffa9028e97adb801f0d6ec46d02f8b07 requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
```

Similarly when I try and invoke the endpoint via python/boto it confirms the issue:

```
Traceback (most recent call last):
File "/../sagemaker.py", line 44, in <module>
main()
File "/../sagemaker.py", line 39, in main
response = predict_data(sagemaker_runtime, endpoint_name, request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/../sagemaker.py", line 7, in predict_data
response = sagemaker_runtime.invoke_endpoint(EndpointName=endpoint_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/botocore/client.py", line 535, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/botocore/client.py", line 980, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Loading /.sagemaker/mms/models/tiiuae__falcon-7b-instruct.eb410fb6ffa9028e97adb801f0d6ec46d02f8b07 requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code\u003dTrue` to remove this error."
}
```

Here's some of the Terraform I'm using to configure the AWS resources for this:

```
# module.huggingface_sagemaker_falcon.data.aws_sagemaker_prebuilt_ecr_image.deploy_image:
data "aws_sagemaker_prebuilt_ecr_image" "deploy_image" {
image_tag = "2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04"
repository_name = "huggingface-pytorch-inference"
}

# module.huggingface_sagemaker_falcon.aws_sagemaker_model.model_with_hub_model[0]:
resource "aws_sagemaker_model" "model_with_hub_model" {
enable_network_isolation = false
execution_role_arn = aws_iam_role.new_role.arn
name = "falcon-model"

primary_container {
environment = {
"HF_MODEL_ID" = "tiiuae/falcon-7b-instruct"
"HF_TASK" = "text-generation"
"HF_TRUST_REMOTE_CODE" = "True"
"HF_MODEL_REVISION" = "eb410fb6ffa9028e97adb801f0d6ec46d02f8b07"
}
image = data.aws_sagemaker_prebuilt_ecr_image.deploy_image.registry_path
mode = "SingleModel"
}
}
```

Here's my Python:

```
import boto3
import json


def predict_data(sagemaker_runtime, endpoint_name, input_data):
json_input_data = json.dumps(input_data).encode('utf-8')
response = sagemaker_runtime.invoke_endpoint(EndpointName=endpoint_name,
ContentType='application/json',
Body=json_input_data)
return response


def main():
region_name = 'us-east-2'
session = boto3.Session(region_name=region_name)
sagemaker_runtime = session.client('sagemaker-runtime')

endpoint_name = '<endpoint name>'

# define prompt
prompt = """You are the most advanced AI assistant on the planet, called Falcon.

User: How can we set up Kubernetes cluster on AWS? Think step by step.
Falcon:"""

# hyperparameters for llm
request = {
"inputs": prompt
}

response = predict_data(sagemaker_runtime, endpoint_name, request)
print(response)


if __name__ == "__main__":
main()
```

Seems that the config.json needs this added? From the AWS side, I couldn't find a way to configure the model, endpoint configuration, or endpoint resources to override that.

Also making the assumption that true is treated like a bool in that configuration and the capitalization `true` vs `True` does not matter.

I'm very much an ML novice so if this is a security concern or if there is in fact a way to configure our AWS Sagemaker resources, or my python request, to trust remote code, or if I am completely in the wrong stratosphere as to how this all works, please feel free to reject this and let me know.

Files changed (1) hide show
  1. config.json +1 -0
config.json CHANGED
@@ -23,6 +23,7 @@
23
  "parallel_attn": true,
24
  "torch_dtype": "bfloat16",
25
  "transformers_version": "4.27.4",
 
26
  "use_cache": true,
27
  "vocab_size": 65024
28
  }
 
23
  "parallel_attn": true,
24
  "torch_dtype": "bfloat16",
25
  "transformers_version": "4.27.4",
26
+ "trust_remote_code": true,
27
  "use_cache": true,
28
  "vocab_size": 65024
29
  }