Object detection training examples for K210

  • Continuing the discussion started at https://github.com/kendryte/nncase/issues/86. My initial intention is to find some out of the box example of object detection model training to start with. I've successfully tried pre-built models based on tinyYOLO (I think, it was 20-class and face detection models), both in MaixPy and C SDK environment, work just fine, but the next step is to try training it from the start.

    The most informative tutorial I found for object detection exactly is this one, very detailed, but things seem to have changed since the time of its writing, so several things now has to be fixed to get the final .kmodel successfully. I seem to make it work, but it just detects nothing for me. No racoons found 🙂

    Another way suggested by krishnak at the Github issue discussion mentioned above is to try out Ultra fast face detector which seems to be on Caffe. And this is also a way for converting some PyTorch models, which I'm personally more comfortable with. Still, this way includes two conversion steps instead of one and is apparently asking for problems some more)

    So yes, I just wanted to ask, if maybe there is other tutorials of adopting object detection models for use on K210, fresh enough to be compatible with recent nncase and things.

  • A little bit of answering myself, on Sipeed forum I've been suggested with a very nice and detailed guide on Yolov2 model training. Well, the same model again, but this guide is very comprehensive, up-to-date (the model ported to Tensorflow 2) and runs flawlessly at the moment of writing. Would definitely recommend it to look through.

  • Yes, a separate NMS on the CPU is adding to overheads on K210

  • @tensilestrength No, not really. Do you like to have it in the network just to speed it up running on neural unit? You can take a look at CUDA implementation of NMS e.g. in torchvision. NMS kernel itself seems to be made of quite base operations available on K210 as well.

  • Do you have experience with any implementation of non max suppression as a part of convolutional neural network?

  • @tensilestrength thank you! No, I actually haven't found this exact article, but seen the one it references to, by DmitryM8. That's indeed look pretty straightforward, but it's all about image classification, not object detection, and also requires training on ImageNet, which in CPU case may consume 'some time' (tm) 🙂 But yes, anyway, that a way to go as well.

  • @aclex I found this - unless you have already found it

    Colab link for transfer learning

  • @tensilestrength Apparently it depends on the model in that if it has some CUDA-specific instructions, it cannot convert it to ONNX on CPU only. Namely, any available torchvision distribution I came across was implicitly compiled with CUDA code paths enabled, so some operations still require CUDA available. Probably the same for this model. No quick fix comes to my mind, unfortunately.

  • @aclex My CUDA issues can be seen here when I tried to convert pytorch model to ONNX first. I couldn't go beyond that, if you can figure out a fix please do let me know

    Edit :https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB/issues/137

  • Thank you very much for the detailed guide, @tensilestrength! Yes, I also consider Keras way as primary, at least for now. The only thing worrying me about it is kind of contradiction between different TensorFlow and Keras versions, quite hard to find the proper ones for a given piece of code 🙂

    By the way, have you experienced problems with exporting PyTorch models to ONNX without GPU or using the output ONNX models? Seems to be quite strange, if they've done the ONNX export impossible without GPU. Another issue surely could be with the versions again — I saw some distributions be declared as CPU only, but still depend on the GPU. So yes, everyone seems to promise us happy debugging sessions 🙂 But yes, here's how it is.

  • I have given up on the pytorch route as I didn't have a CUDA enabled machine which the converter was expecting to convert to ONNX, even when set to CPU only mode it was throwing up GPU not found errors. There were too many new unknowns to firefight.

    I guess for time being the simpler route for a newbie will be to use keras for model creation - if you can handle pytorch with such less documentation (or atleast I couldn't find them) keras should get you flying quickly. Use the latest version of Keras 2.1 if I am right

    then use

    tflite_convert --keras_model_file=yourmodel.h5  --output_file=newfile.tflite --enable-v1-converter

    once you get the tflite format use the nncase

    ./ncc compile yourmodel.tflite yourmodel.kmodel -i tflite  --dataset 1.jpg
    import tensorflow as tf
    import time
    import numpy as np
    from tensorflow import keras
    from keras.utils.vis_utils import plot_model
    model = keras.Sequential([keras.layers.Dense(units=1,input_shape=[1])])
    a= np.array([-1.0,0.0,1.0,2.0,3.0,4.0,10.0,20.0],dtype=float)
    b= np.array([-3.0,-1.0,0.0,3.0,5.0,7.0,19.0,39.0],dtype=float)
    for x in range(10,100,10):
        numpystarttime = time.time()
        output = model.predict([x])
        numpyendtime = time.time()
        elapsedNumpy = numpyendtime-numpystarttime
        print("Elapsed time for inference ",elapsedNumpy)
        t1= time.time()
        directcalc = (2*x)-1
        t2 = time.time()
        print("Direct calculation result %f :time taken = %f" %(directcalc,(t2-t1)))

    The above code is for creating a new model for a simple formula y=2x-1 to test the full conversion process as described above.

    Since this model outputs float during kmodel conversion use the following syntax

    ./ncc compile kky2x.tflite y2x.kmodel -i tflite  --inference-type float

    in the C code for K210 (refer to examples under nncase github) you just need to load the model using

     const  float test[]={80};
            kpu_run_kmodel(&task, test, DMAC_CHANNEL5, ai_done, NULL);
            float *result;
            kpu_get_output(&task, 0, &result, &output_size);
            printf("result : %f",result[0]);

    It should print 159.xxx the inference time is around 6uS on K210, I am not sure whether it is using CPU or KPU considering this operation is a float.

    I hope there will be some tinkering when you are trying to port any existing model to KModel.