Merge cekal/mpt-7b-peft-compatible

#42

by muelletm - opened May 27, 2023

base: refs/heads/main

←

from: refs/pr/42

Discussion Files changed

+68

-8

muelletm

May 27, 2023

•

edited May 27, 2023

Merges https://huggingface.co/cekal/mpt-7b-peft-compatible by @cekal .

This will add support for peft as well as qlora.

I tested that qlora starts training:

https://github.com/artidoro/qlora/issues/10

git clone https://huggingface.co/mosaicml/mpt-7b
pushd mpt-7b 
git fetch origin refs/pr/42:pr/42
git checkout pr/42
popd

python qlora.py \
    --model_name_or_path ./mpt-7b \
    --trust_remote_code True \
    --output_dir /output \
    --dataset alpaca \
    --do_train True \
    --do_eval True \
    --do_mmlu_eval True \
    --source_max_len 384 \
    --target_max_len 128 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --logging_steps 10 \
    --max_steps 10000 \
    --save_strategy steps \
    --data_seed 42 \
    --save_steps 1000 \
    --save_total_limit 40 \
    --evaluation_strategy steps \
    --eval_dataset_size 1024 \
    --max_eval_samples 1000 \
    --eval_steps 1000 \
    --optim paged_adamw_32bit

Merges changes from https://huggingface.co/cekal/mpt-7b-peft-compatible.6b4ff270

muelletm changed pull request status to open May 27, 2023

SebastianBodza

May 30, 2023

Any Differences to #25 ?

muelletm

May 30, 2023

Looks pretty similar TBH.

One difference is this line that is needed to work properly with device_map="auto":

(Around L290)

        outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache, inputs_embeds=inputs_embeds)
        
        last_hidden_state = outputs.last_hidden_state
        if self.model_parallel:
            last_hidden_state = last_hidden_state.to(self.transformer.wte.weight.device)
        logits = F.linear(last_hidden_state, self.transformer.wte.weight)

But that line could also be added there, I suppose.

There might be subtle differences in other places, too, but as I said the code looks pretty similar.

eluzhnica

Jun 28, 2023

I'm not sure why the additional param inputs_embeds is needed. Maybe it's being used for something where they already have the embedding? Someone knows?

I made a similar version of this for 30B too on top of the latest foundry changes and it trains with QLORA https://huggingface.co/eluzhnica/mpt-30b-peft-compatible. It does train well from what I've tried.

DanielTTY

Jul 1, 2023

can do the same thing for the 30b version?

DanielTTY

Jul 1, 2023

I'm not sure why the additional param inputs_embeds is needed. Maybe it's being used for something where they already have the embedding? Someone knows?

I made a similar version of this for 30B too on top of the latest foundry changes and it trains with QLORA https://huggingface.co/eluzhnica/mpt-30b-peft-compatible. It does train well from what I've tried.

I tried this and it gives the error:
TypeError: forward() takes 2 positional arguments but 3 were given

I think this is the same error when one sets "--gradient_checkpointing False".

deleted

Jan 29

So I know MPT-7B doesn't support gradient checkpointing while using the Huggingface Trainer, but if you set it to false, you get the "TypeError: forward() takes 2 positional arguments but 3 were given" error? Because I have been dealing with that error for weeks now and this might be the breakthrough I needed to convince me to just abandon MPT all together

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Cannot merge

This branch has merge conflicts in the following files:

modeling_mpt.py

· Sign up or log in to comment