i use kernel 3x3 ,input image size is 3x320x240 ,output 12x320x240,include batchnorm and leakyrelu,got about 130Gmac/s, does add cost half the time ? some demo board donate 230Gmul /s ?