Deploy with SageMaker
When following the instructions under Deploy --> Amazon SageMaker --> SageMaker SDK --> deploy.py
import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'Snowflake/snowflake-arctic-embed-m-long'
}
create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface-tei",version="1.2.3"),
env=hub,
role=role,
)
deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
)
send request
predictor.predict({
"inputs": "My name is Clara and I am",
})
I receive the error:
UnexpectedStatusException: Error hosting endpoint tei-2024-07-10-22-05-53-662: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.. Try changing the instance type or reference the troubleshooting page https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-troubleshooting.html
Within the CloudWatch logs I found the error:
Error: Model backend is not healthy
Caused by:
unexpected rank, expected: 2, got: 1 ([768])
I was able to successfully create a SageMaker Endpoint for Snowflake/snowflake-arctic-embed-l, but require this long-context variant. Please let me know how to overcome this error.
you need to have sagemaker support trust remote code https://huggingface.co/tiiuae/falcon-7b-instruct/commit/777a465507c47b7c7377c6bff3fb783ee81dd787
I installed the model locally and modified the config.json by adding "trust_remote_code": true
"torch_dtype": "float32",
"transformers_version": "4.36.1",
"trust_remote_code": true,
"type_vocab_size": 2,
"use_cache": true,
"use_flash_attn": true,
"use_rms_norm": false,
"use_xentropy": true,
"vocab_size": 30528
}
I than compressed it into a tar.gz following the instructions here: https://huggingface.co/docs/sagemaker/inference#create-a-model-artifact-for-deployment
I was able to create the SageMaker endpoint:
import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
trust_remote_code = True
hub = {
'HF_MODEL_ID':'Snowflake/snowflake-arctic-embed-m-long',
'HF_TASK':'feature-extraction',
'HF_MODEL_TRUST_REMOTE_CODE': json.dumps(trust_remote_code)
}
huggingface_model = HuggingFaceModel(
model_data="s3://sagemaker-us-gov-west-1-077510649301/huggingface-models/snowflake-arctic-embed-m-long-config-mod.tar.gz", # path to your trained SageMaker model
role=role, # IAM role with permissions to create an endpoint
transformers_version="4.26", # Transformers version used
pytorch_version="1.13", # PyTorch version used
py_version='py39', # Python version used
env=hub,
)
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
endpoint_name="snowflake-arctic-embed-m-long",
)
However, I get a trust remote code error when trying to use the endpoint:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Loading /.sagemaker/mms/models/Snowflake__snowflake-arctic-embed-m-long requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code\u003dTrue
to remove this error."
}
". See https://us-gov-west-1.console.aws.amazon.com/cloudwatch/home?region=us-gov-west-1#logEventViewer:group=/aws/sagemaker/Endpoints/iproposal-sandbox-embedding-snowflake-arctic-embed-m-long in account 077510649301 for more information.
Any solution to this?