Truncated output from API call through langchain
Hi all
I am using a hosted inference endpoint on HF, and calling it through the HuggingFace endpoint provided by langchain.
When I ask any question, the output seems to be truncated, any idea as to why that might be the case?
Following is my code:
from langchain.llms import HuggingFaceEndpoint
from langchain import HuggingFaceHub
from langchain import PromptTemplate, LLMChain
endpoint_url = (
'ENDPOINT_URL'
)
hf = HuggingFaceEndpoint(
endpoint_url=endpoint_url,
huggingfacehub_api_token= TOKEN,
task = 'text-generation'
)
template = """Question: {question}
Answer: """
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=hf)
question = "When did Germany unite? "
print(llm_chain.run(question))
And the following is my output:
1990, following the reunification of East
Any help please?
Thanks
HuggingFaceEndpoint truncates the text because it assumes the endpoint returns the prompt together with generated text. You need to modify the _call method of HuggingFaceEndpoint so that it doesn't substring the generated_text and return the whole text.
HuggingFaceEndpoint truncates the text because it assumes the endpoint returns the prompt together with generated text. You need to modify the _call method of HuggingFaceEndpoint so that it doesn't substring the generated_text and return the whole text.
So you mean the following part specifically in the _call method?:
# Text generation return includes the starter text.
text = generated_text[0]["generated_text"][len(prompt) :]
I have to play with the indexing which is currently done to get the part after the prompt length?
https://github.com/hwchase17/langchain/blob/master/langchain/llms/huggingface_endpoint.py
No just remove the indexing. The indexing assumes that the generated_text includes the prompt (hence it's substring the generated_text from len(prompt) to the end. Just modify it to be
text = generated_text[0]["generated_text"].
No just remove the indexing. The indexing assumes that the generated_text includes the prompt (hence it's substring the generated_text from len(prompt) to the end. Just modify it to be
text = generated_text[0]["generated_text"].
Yup that's what I meant. Thank you.