Was the whole word masking technique used during pre-training?

#1
by ryo0634 - opened

Thank you for sharing the fantastic model!
I have a quick question, was the model trained with the whole word masking technique?

Language Media Processing Lab at Kyoto University org

No, it's not trained with WWM, following the original paper.
Training DeBERTa with WWM is one of possible future improvements.

I see, thank you very much for your quick response

ryo0634 changed discussion status to closed

Sign up or log in to comment