Understanding CoreML conversion of llama 2 7b

by kharish89 - opened Jul 18, 2023

Jul 18, 2023

Could you kindly provide more details on the hardware used and process of conversion in a blog/guide style. So many of the community members can benefit the learnings.

xiaoymin

Jul 19, 2023

pcuenq

Core ML Projects org Jul 19, 2023

I'll publish a guide focused on conversion in a few days!

In addition, we need to provide some of the pieces required to perform text generation with the converted model: tokenizers, text generation strategies, etc. Working on it!

graelo

Jul 22, 2023

Thanks @pcuenq , I'm looking forward to it ❤️
Where will you publish it?

ekelund

Jul 23, 2023

That would be amazing! TIA.

SriBalaaji

Jul 24, 2023

Thanks for the work

Leszekasdfff

Jul 25, 2023

Hi! Thanks a lot for your work! Where the guide can be found?

JECuello

Aug 1, 2023

Amazing work!, There is a variant of the diffusers app adapted for querying llama2?

pcuenq

Core ML Projects org Aug 23, 2023

In case you didn't see it, we published swift-transformers and this post a couple of weeks ago: https://huggingface.co/blog/swift-coreml-llm

Please, let us know if that's helpful, or if you'd like us to dive in more depth on any of the topics :)

Ovats

Aug 24, 2023

complete noob here, but would it be possible to show how to run the coreML model? I am attempting to build a stock app that can process news and give a summary on a stock, but when I load the model, it requires the attention mask and the inputs are in the form of an integer array. Not sure how to use the tokenizer in coreML for it

kharish89

Aug 25, 2023

Thanks for sharing!

Xenova

Core ML Projects org Aug 25, 2023

•

edited Aug 25, 2023

@Ovats Perhaps this section in the blog post could help! It covers how to do tokenization in Swift with swift-transformers.

import Tokenizers

func testTokenizer() async throws {
    let tokenizer = try await AutoTokenizer.from(pretrained: "pcuenq/Llama-2-7b-chat-coreml")
    let inputIds = tokenizer("Today she took a train to the West")
    assert(inputIds == [1, 20628, 1183, 3614, 263, 7945, 304, 278, 3122])
}

The swift-transformers library is still new though, and @pcuenq will be making improvements to it to make it even easier! Perhaps he can add some extra context here too.

pcuenq

Core ML Projects org Aug 27, 2023

Hi @Ovats !

The swift-transformers library will deal with many of those details automatically. I would recommend you take a look at the swift-chat example app, which simply calls generate with a prompt and a configuration object and swift-transformers will do the rest. Under the hood, it will:

Tokenize the prompt, using code similar to what @Xenova posted above.
Invoke the model repeatedly, because language models produce one token at a time. For example, the greedySearch generation method uses a loop to get the most probable token each time, and it appends it to the output.
Prepare a suitable attention mask when necessary (not all models require it).

Please, let us know if that helps!

rradjabi

May 9, 2024

@pcuenq

How can we use this model in swift-chat and target the ANE?

Proryanator

May 9, 2024

@pcuenq

How can we use this model in swift-chat and target the ANE?

There's a great ANE repo here that discusses ways to get it on the ANE but it doesn't appear to be guaranteed. A lot of whether a model uses the ANE is a black box. But you can try!

https://github.com/hollance/neural-engine

Proryanator

Jul 28, 2024

@pcuenq do you have the code you used to convert the llama2-hf model to coreml? Or scripts?

Currently getting stuck here: https://github.com/huggingface/exporters/issues/76 (as well as any Llama2 based model)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment