Was the whole word masking technique used during pre-training?
#1
by
ryo0634
- opened
Thank you for sharing the fantastic model!
I have a quick question, was the model trained with the whole word masking technique?
No, it's not trained with WWM, following the original paper.
Training DeBERTa with WWM is one of possible future improvements.
I see, thank you very much for your quick response
ryo0634
changed discussion status to
closed