Why can this program successfully predict the next word by only passing in the token generated last time? No complete prompt token was passed in
i got it,because llama have cache_k 和 cache -v
· Sign up or log in to comment