Model converted by the transformers' pt_to_tf CLI. All converted model outputs and hidden layers were validated against its PyTorch counterpart.

Maximum crossload output difference=1.669e-05; Maximum crossload hidden layer difference=9.766e-03;
Maximum conversion output difference=1.669e-05; Maximum conversion hidden layer difference=9.766e-03;

CAUTION: The maximum admissible error was manually increased to 0.1!

See GitHub PR #25558 for details, precision overridden due to hidden states being a little weird in TF; final output logits are within 1.788e-05 for all model variants/sizes.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment