# Model Performance Benchmarks All benchmarks run as per: ``` python onnx_export.py --model mobilenetv3_100 ./mobilenetv3_100.onnx python onnx_optimize.py ./mobilenetv3_100.onnx --output mobilenetv3_100-opt.onnx python onnx_to_caffe.py ./mobilenetv3_100.onnx --c2-prefix mobilenetv3 python onnx_to_caffe.py ./mobilenetv3_100-opt.onnx --c2-prefix mobilenetv3-opt python caffe2_benchmark.py --c2-init ./mobilenetv3.init.pb --c2-predict ./mobilenetv3.predict.pb python caffe2_benchmark.py --c2-init ./mobilenetv3-opt.init.pb --c2-predict ./mobilenetv3-opt.predict.pb ``` ## EfficientNet-B0 ### Unoptimized ``` Main run finished. Milliseconds per iter: 49.2862. Iters per second: 20.2897 Time per operator type: 29.7378 ms. 60.5145%. Conv 12.1785 ms. 24.7824%. Sigmoid 3.62811 ms. 7.38297%. SpatialBN 2.98444 ms. 6.07314%. Mul 0.326902 ms. 0.665225%. AveragePool 0.197317 ms. 0.401528%. FC 0.0852877 ms. 0.173555%. Add 0.0032607 ms. 0.00663532%. Squeeze 49.1416 ms in Total FLOP per operator type: 0.76907 GFLOP. 95.2696%. Conv 0.0269508 GFLOP. 3.33857%. SpatialBN 0.00846444 GFLOP. 1.04855%. Mul 0.002561 GFLOP. 0.317248%. FC 0.000210112 GFLOP. 0.0260279%. Add 0.807256 GFLOP in Total Feature Memory Read per operator type: 58.5253 MB. 43.0891%. Mul 43.2015 MB. 31.807%. Conv 27.2869 MB. 20.0899%. SpatialBN 5.12912 MB. 3.77631%. FC 1.6809 MB. 1.23756%. Add 135.824 MB in Total Feature Memory Written per operator type: 33.8578 MB. 38.1965%. Mul 26.9881 MB. 30.4465%. Conv 26.9508 MB. 30.4044%. SpatialBN 0.840448 MB. 0.948147%. Add 0.004 MB. 0.00451258%. FC 88.6412 MB in Total Parameter Memory per operator type: 15.8248 MB. 74.9391%. Conv 5.124 MB. 24.265%. FC 0.168064 MB. 0.795877%. SpatialBN 0 MB. 0%. Add 0 MB. 0%. Mul 21.1168 MB in Total ``` ### Optimized ``` Main run finished. Milliseconds per iter: 46.0838. Iters per second: 21.6996 Time per operator type: 29.776 ms. 65.002%. Conv 12.2803 ms. 26.8084%. Sigmoid 3.15073 ms. 6.87815%. Mul 0.328651 ms. 0.717456%. AveragePool 0.186237 ms. 0.406563%. FC 0.0832429 ms. 0.181722%. Add 0.0026184 ms. 0.00571606%. Squeeze 45.8078 ms in Total FLOP per operator type: 0.76907 GFLOP. 98.5601%. Conv 0.00846444 GFLOP. 1.08476%. Mul 0.002561 GFLOP. 0.328205%. FC 0.000210112 GFLOP. 0.0269269%. Add 0.780305 GFLOP in Total Feature Memory Read per operator type: 58.5253 MB. 53.8803%. Mul 43.2855 MB. 39.8501%. Conv 5.12912 MB. 4.72204%. FC 1.6809 MB. 1.54749%. Add 108.621 MB in Total Feature Memory Written per operator type: 33.8578 MB. 54.8834%. Mul 26.9881 MB. 43.7477%. Conv 0.840448 MB. 1.36237%. Add 0.004 MB. 0.00648399%. FC 61.6904 MB in Total Parameter Memory per operator type: 15.8248 MB. 75.5403%. Conv 5.124 MB. 24.4597%. FC 0 MB. 0%. Add 0 MB. 0%. Mul 20.9488 MB in Total ``` ## EfficientNet-B1 ### Optimized ``` Main run finished. Milliseconds per iter: 71.8102. Iters per second: 13.9256 Time per operator type: 45.7915 ms. 66.3206%. Conv 17.8718 ms. 25.8841%. Sigmoid 4.44132 ms. 6.43244%. Mul 0.51001 ms. 0.738658%. AveragePool 0.233283 ms. 0.337868%. Add 0.194986 ms. 0.282402%. FC 0.00268255 ms. 0.00388519%. Squeeze 69.0456 ms in Total FLOP per operator type: 1.37105 GFLOP. 98.7673%. Conv 0.0138759 GFLOP. 0.99959%. Mul 0.002561 GFLOP. 0.184489%. FC 0.000674432 GFLOP. 0.0485847%. Add 1.38816 GFLOP in Total Feature Memory Read per operator type: 94.624 MB. 54.0789%. Mul 69.8255 MB. 39.9062%. Conv 5.39546 MB. 3.08357%. Add 5.12912 MB. 2.93136%. FC 174.974 MB in Total Feature Memory Written per operator type: 55.5035 MB. 54.555%. Mul 43.5333 MB. 42.7894%. Conv 2.69773 MB. 2.65163%. Add 0.004 MB. 0.00393165%. FC 101.739 MB in Total Parameter Memory per operator type: 25.7479 MB. 83.4024%. Conv 5.124 MB. 16.5976%. FC 0 MB. 0%. Add 0 MB. 0%. Mul 30.8719 MB in Total ``` ## EfficientNet-B2 ### Optimized ``` Main run finished. Milliseconds per iter: 92.28. Iters per second: 10.8366 Time per operator type: 61.4627 ms. 67.5845%. Conv 22.7458 ms. 25.0113%. Sigmoid 5.59931 ms. 6.15701%. Mul 0.642567 ms. 0.706568%. AveragePool 0.272795 ms. 0.299965%. Add 0.216178 ms. 0.237709%. FC 0.00268895 ms. 0.00295677%. Squeeze 90.942 ms in Total FLOP per operator type: 1.98431 GFLOP. 98.9343%. Conv 0.0177039 GFLOP. 0.882686%. Mul 0.002817 GFLOP. 0.140451%. FC 0.000853984 GFLOP. 0.0425782%. Add 2.00568 GFLOP in Total Feature Memory Read per operator type: 120.609 MB. 54.9637%. Mul 86.3512 MB. 39.3519%. Conv 6.83187 MB. 3.11341%. Add 5.64163 MB. 2.571%. FC 219.433 MB in Total Feature Memory Written per operator type: 70.8155 MB. 54.6573%. Mul 55.3273 MB. 42.7031%. Conv 3.41594 MB. 2.63651%. Add 0.004 MB. 0.00308731%. FC 129.563 MB in Total Parameter Memory per operator type: 30.4721 MB. 84.3913%. Conv 5.636 MB. 15.6087%. FC 0 MB. 0%. Add 0 MB. 0%. Mul 36.1081 MB in Total ``` ## MixNet-M ### Optimized ``` Main run finished. Milliseconds per iter: 63.1122. Iters per second: 15.8448 Time per operator type: 48.1139 ms. 75.2052%. Conv 7.1341 ms. 11.1511%. Sigmoid 2.63706 ms. 4.12189%. SpatialBN 1.73186 ms. 2.70701%. Mul 1.38707 ms. 2.16809%. Split 1.29322 ms. 2.02139%. Concat 1.00093 ms. 1.56452%. Relu 0.235309 ms. 0.367803%. Add 0.221579 ms. 0.346343%. FC 0.219315 ms. 0.342803%. AveragePool 0.00250145 ms. 0.00390993%. Squeeze 63.9768 ms in Total FLOP per operator type: 0.675273 GFLOP. 95.5827%. Conv 0.0221072 GFLOP. 3.12921%. SpatialBN 0.00538445 GFLOP. 0.762152%. Mul 0.003073 GFLOP. 0.434973%. FC 0.000642488 GFLOP. 0.0909421%. Add 0 GFLOP. 0%. Concat 0 GFLOP. 0%. Relu 0.70648 GFLOP in Total Feature Memory Read per operator type: 46.8424 MB. 30.502%. Conv 36.8626 MB. 24.0036%. Mul 22.3152 MB. 14.5309%. SpatialBN 22.1074 MB. 14.3955%. Concat 14.1496 MB. 9.21372%. Relu 6.15414 MB. 4.00735%. FC 5.1399 MB. 3.34692%. Add 153.571 MB in Total Feature Memory Written per operator type: 32.7672 MB. 28.4331%. Conv 22.1072 MB. 19.1831%. Concat 22.1072 MB. 19.1831%. SpatialBN 21.5378 MB. 18.689%. Mul 14.1496 MB. 12.2781%. Relu 2.56995 MB. 2.23003%. Add 0.004 MB. 0.00347092%. FC 115.243 MB in Total Parameter Memory per operator type: 13.7059 MB. 68.674%. Conv 6.148 MB. 30.8049%. FC 0.104 MB. 0.521097%. SpatialBN 0 MB. 0%. Add 0 MB. 0%. Concat 0 MB. 0%. Mul 0 MB. 0%. Relu 19.9579 MB in Total ``` ## TF MobileNet-V3 Large 1.0 ### Optimized ``` Main run finished. Milliseconds per iter: 22.0495. Iters per second: 45.3525 Time per operator type: 17.437 ms. 80.0087%. Conv 1.27662 ms. 5.8577%. Add 1.12759 ms. 5.17387%. Div 0.701155 ms. 3.21721%. Mul 0.562654 ms. 2.58171%. Relu 0.431144 ms. 1.97828%. Clip 0.156902 ms. 0.719936%. FC 0.0996858 ms. 0.457402%. AveragePool 0.00112455 ms. 0.00515993%. Flatten 21.7939 ms in Total FLOP per operator type: 0.43062 GFLOP. 98.1484%. Conv 0.002561 GFLOP. 0.583713%. FC 0.00210867 GFLOP. 0.480616%. Mul 0.00193868 GFLOP. 0.441871%. Add 0.00151532 GFLOP. 0.345377%. Div 0 GFLOP. 0%. Relu 0.438743 GFLOP in Total Feature Memory Read per operator type: 34.7967 MB. 43.9391%. Conv 14.496 MB. 18.3046%. Mul 9.44828 MB. 11.9307%. Add 9.26157 MB. 11.6949%. Relu 6.0614 MB. 7.65395%. Div 5.12912 MB. 6.47673%. FC 79.193 MB in Total Feature Memory Written per operator type: 17.6247 MB. 35.8656%. Conv 9.26157 MB. 18.847%. Relu 8.43469 MB. 17.1643%. Mul 7.75472 MB. 15.7806%. Add 6.06128 MB. 12.3345%. Div 0.004 MB. 0.00813985%. FC 49.1409 MB in Total Parameter Memory per operator type: 16.6851 MB. 76.5052%. Conv 5.124 MB. 23.4948%. FC 0 MB. 0%. Add 0 MB. 0%. Div 0 MB. 0%. Mul 0 MB. 0%. Relu 21.8091 MB in Total ``` ## MobileNet-V3 (RW) ### Unoptimized ``` Main run finished. Milliseconds per iter: 24.8316. Iters per second: 40.2712 Time per operator type: 15.9266 ms. 69.2624%. Conv 2.36551 ms. 10.2873%. SpatialBN 1.39102 ms. 6.04936%. Add 1.30327 ms. 5.66773%. Div 0.737014 ms. 3.20517%. Mul 0.639697 ms. 2.78195%. Relu 0.375681 ms. 1.63378%. Clip 0.153126 ms. 0.665921%. FC 0.0993787 ms. 0.432184%. AveragePool 0.0032632 ms. 0.0141912%. Squeeze 22.9946 ms in Total FLOP per operator type: 0.430616 GFLOP. 94.4041%. Conv 0.0175992 GFLOP. 3.85829%. SpatialBN 0.002561 GFLOP. 0.561449%. FC 0.00210961 GFLOP. 0.46249%. Mul 0.00173891 GFLOP. 0.381223%. Add 0.00151626 GFLOP. 0.33241%. Div 0 GFLOP. 0%. Relu 0.456141 GFLOP in Total Feature Memory Read per operator type: 34.7354 MB. 36.4363%. Conv 17.7944 MB. 18.6658%. SpatialBN 14.5035 MB. 15.2137%. Mul 9.25778 MB. 9.71113%. Relu 7.84641 MB. 8.23064%. Add 6.06516 MB. 6.36216%. Div 5.12912 MB. 5.38029%. FC 95.3317 MB in Total Feature Memory Written per operator type: 17.6246 MB. 26.7264%. Conv 17.5992 MB. 26.6878%. SpatialBN 9.25778 MB. 14.0387%. Relu 8.43843 MB. 12.7962%. Mul 6.95565 MB. 10.5477%. Add 6.06502 MB. 9.19713%. Div 0.004 MB. 0.00606568%. FC 65.9447 MB in Total Parameter Memory per operator type: 16.6778 MB. 76.1564%. Conv 5.124 MB. 23.3979%. FC 0.0976 MB. 0.445674%. SpatialBN 0 MB. 0%. Add 0 MB. 0%. Div 0 MB. 0%. Mul 0 MB. 0%. Relu 21.8994 MB in Total ``` ### Optimized ``` Main run finished. Milliseconds per iter: 22.0981. Iters per second: 45.2527 Time per operator type: 17.146 ms. 78.8965%. Conv 1.38453 ms. 6.37084%. Add 1.30991 ms. 6.02749%. Div 0.685417 ms. 3.15391%. Mul 0.532589 ms. 2.45068%. Relu 0.418263 ms. 1.92461%. Clip 0.15128 ms. 0.696106%. FC 0.102065 ms. 0.469648%. AveragePool 0.0022143 ms. 0.010189%. Squeeze 21.7323 ms in Total FLOP per operator type: 0.430616 GFLOP. 98.1927%. Conv 0.002561 GFLOP. 0.583981%. FC 0.00210961 GFLOP. 0.481051%. Mul 0.00173891 GFLOP. 0.396522%. Add 0.00151626 GFLOP. 0.34575%. Div 0 GFLOP. 0%. Relu 0.438542 GFLOP in Total Feature Memory Read per operator type: 34.7842 MB. 44.833%. Conv 14.5035 MB. 18.6934%. Mul 9.25778 MB. 11.9323%. Relu 7.84641 MB. 10.1132%. Add 6.06516 MB. 7.81733%. Div 5.12912 MB. 6.61087%. FC 77.5861 MB in Total Feature Memory Written per operator type: 17.6246 MB. 36.4556%. Conv 9.25778 MB. 19.1492%. Relu 8.43843 MB. 17.4544%. Mul 6.95565 MB. 14.3874%. Add 6.06502 MB. 12.5452%. Div 0.004 MB. 0.00827378%. FC 48.3455 MB in Total Parameter Memory per operator type: 16.6778 MB. 76.4973%. Conv 5.124 MB. 23.5027%. FC 0 MB. 0%. Add 0 MB. 0%. Div 0 MB. 0%. Mul 0 MB. 0%. Relu 21.8018 MB in Total ``` ## MnasNet-A1 ### Unoptimized ``` Main run finished. Milliseconds per iter: 30.0892. Iters per second: 33.2345 Time per operator type: 24.4656 ms. 79.0905%. Conv 4.14958 ms. 13.4144%. SpatialBN 1.60598 ms. 5.19169%. Relu 0.295219 ms. 0.95436%. Mul 0.187609 ms. 0.606486%. FC 0.120556 ms. 0.389724%. AveragePool 0.09036 ms. 0.292109%. Add 0.015727 ms. 0.050841%. Sigmoid 0.00306205 ms. 0.00989875%. Squeeze 30.9337 ms in Total FLOP per operator type: 0.620598 GFLOP. 95.6434%. Conv 0.0248873 GFLOP. 3.8355%. SpatialBN 0.002561 GFLOP. 0.394688%. FC 0.000597408 GFLOP. 0.0920695%. Mul 0.000222656 GFLOP. 0.0343146%. Add 0 GFLOP. 0%. Relu 0.648867 GFLOP in Total Feature Memory Read per operator type: 35.5457 MB. 38.4109%. Conv 25.1552 MB. 27.1829%. SpatialBN 22.5235 MB. 24.339%. Relu 5.12912 MB. 5.54256%. FC 2.40586 MB. 2.59978%. Mul 1.78125 MB. 1.92483%. Add 92.5406 MB in Total Feature Memory Written per operator type: 24.9042 MB. 32.9424%. Conv 24.8873 MB. 32.92%. SpatialBN 22.5235 MB. 29.7932%. Relu 2.38963 MB. 3.16092%. Mul 0.890624 MB. 1.17809%. Add 0.004 MB. 0.00529106%. FC 75.5993 MB in Total Parameter Memory per operator type: 10.2732 MB. 66.1459%. Conv 5.124 MB. 32.9917%. FC 0.133952 MB. 0.86247%. SpatialBN 0 MB. 0%. Add 0 MB. 0%. Mul 0 MB. 0%. Relu 15.5312 MB in Total ``` ### Optimized ``` Main run finished. Milliseconds per iter: 24.2367. Iters per second: 41.2597 Time per operator type: 22.0547 ms. 91.1375%. Conv 1.49096 ms. 6.16116%. Relu 0.253417 ms. 1.0472%. Mul 0.18506 ms. 0.76473%. FC 0.112942 ms. 0.466717%. AveragePool 0.086769 ms. 0.358559%. Add 0.0127889 ms. 0.0528479%. Sigmoid 0.0027346 ms. 0.0113003%. Squeeze 24.1994 ms in Total FLOP per operator type: 0.620598 GFLOP. 99.4581%. Conv 0.002561 GFLOP. 0.41043%. FC 0.000597408 GFLOP. 0.0957417%. Mul 0.000222656 GFLOP. 0.0356832%. Add 0 GFLOP. 0%. Relu 0.623979 GFLOP in Total Feature Memory Read per operator type: 35.6127 MB. 52.7968%. Conv 22.5235 MB. 33.3917%. Relu 5.12912 MB. 7.60406%. FC 2.40586 MB. 3.56675%. Mul 1.78125 MB. 2.64075%. Add 67.4524 MB in Total Feature Memory Written per operator type: 24.9042 MB. 49.1092%. Conv 22.5235 MB. 44.4145%. Relu 2.38963 MB. 4.71216%. Mul 0.890624 MB. 1.75624%. Add 0.004 MB. 0.00788768%. FC 50.712 MB in Total Parameter Memory per operator type: 10.2732 MB. 66.7213%. Conv 5.124 MB. 33.2787%. FC 0 MB. 0%. Add 0 MB. 0%. Mul 0 MB. 0%. Relu 15.3972 MB in Total ``` ## MnasNet-B1 ### Unoptimized ``` Main run finished. Milliseconds per iter: 28.3109. Iters per second: 35.322 Time per operator type: 29.1121 ms. 83.3081%. Conv 4.14959 ms. 11.8746%. SpatialBN 1.35823 ms. 3.88675%. Relu 0.186188 ms. 0.532802%. FC 0.116244 ms. 0.332647%. Add 0.018641 ms. 0.0533437%. AveragePool 0.0040904 ms. 0.0117052%. Squeeze 34.9451 ms in Total FLOP per operator type: 0.626272 GFLOP. 96.2088%. Conv 0.0218266 GFLOP. 3.35303%. SpatialBN 0.002561 GFLOP. 0.393424%. FC 0.000291648 GFLOP. 0.0448034%. Add 0 GFLOP. 0%. Relu 0.650951 GFLOP in Total Feature Memory Read per operator type: 34.4354 MB. 41.3788%. Conv 22.1299 MB. 26.5921%. SpatialBN 19.1923 MB. 23.0622%. Relu 5.12912 MB. 6.16333%. FC 2.33318 MB. 2.80364%. Add 83.2199 MB in Total Feature Memory Written per operator type: 21.8266 MB. 34.0955%. Conv 21.8266 MB. 34.0955%. SpatialBN 19.1923 MB. 29.9805%. Relu 1.16659 MB. 1.82234%. Add 0.004 MB. 0.00624844%. FC 64.016 MB in Total Parameter Memory per operator type: 12.2576 MB. 69.9104%. Conv 5.124 MB. 29.2245%. FC 0.15168 MB. 0.865099%. SpatialBN 0 MB. 0%. Add 0 MB. 0%. Relu 17.5332 MB in Total ``` ### Optimized ``` Main run finished. Milliseconds per iter: 26.6364. Iters per second: 37.5426 Time per operator type: 24.9888 ms. 94.0962%. Conv 1.26147 ms. 4.75011%. Relu 0.176234 ms. 0.663619%. FC 0.113309 ms. 0.426672%. Add 0.0138708 ms. 0.0522311%. AveragePool 0.00295685 ms. 0.0111341%. Squeeze 26.5566 ms in Total FLOP per operator type: 0.626272 GFLOP. 99.5466%. Conv 0.002561 GFLOP. 0.407074%. FC 0.000291648 GFLOP. 0.0463578%. Add 0 GFLOP. 0%. Relu 0.629124 GFLOP in Total Feature Memory Read per operator type: 34.5112 MB. 56.4224%. Conv 19.1923 MB. 31.3775%. Relu 5.12912 MB. 8.3856%. FC 2.33318 MB. 3.81452%. Add 61.1658 MB in Total Feature Memory Written per operator type: 21.8266 MB. 51.7346%. Conv 19.1923 MB. 45.4908%. Relu 1.16659 MB. 2.76513%. Add 0.004 MB. 0.00948104%. FC 42.1895 MB in Total Parameter Memory per operator type: 12.2576 MB. 70.5205%. Conv 5.124 MB. 29.4795%. FC 0 MB. 0%. Add 0 MB. 0%. Relu 17.3816 MB in Total ```