Why can this program successfully predict the next word by only passing in the token generated last time? No complete prompt token was passed in

#21
by LJUN9988 - opened

Why can this program successfully predict the next word by only passing in the token generated last time? No complete prompt token was passed in
mmexport1730692611501.png

i got it,because
llama have cache_k 和 cache -v

Sign up or log in to comment