## OpenCL/HLS/Verilog Convolution Implementation Comparison Study

| Implementation                           | Lines<br>of<br>code                                                         | Time to compile for emulation                                                                                                                                                                                                                                                                                                  | Time to debug code                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Time to<br>Optimize |
|------------------------------------------|-----------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|
| OpenCL<br>Host: .<br>cpp Kernel: .<br>cl | Host:<br>131<br>Kernel:<br>31                                               | Total Time: 6hours approx.<br>Host: 4hours approx.<br>Kernel: 3hours approx.<br>This time includes the time taken<br>to write and successfully compile<br>the code. Does not include the<br>time to remove logical errors.                                                                                                     | Total Time: 15 hours approx.<br>Host: 6hours approx.<br>Kernel: 4hours approx.<br>This time includes the time taken to remove logical errors, to understand how to<br>create .exe and .xclbin file and how to force SW_Emulation on SDx. Also includes<br>the time taken to perform multiple test iterations until the right output was achieved                                                                                                                                                                                                                                              |                     |
| Vivado HLS                               | Host<br>(Written<br>in SDx):<br>154<br>Kernel<br>(Written<br>in HLS):<br>42 | Total Time: 2.5hours approx.<br>Kernel: 20 mins. approx. Same<br>as used in OpenCL. Added a few<br>pragmas to change the kernel<br>into an AXI interface<br>Host: 2 hours approx.<br>Wrote the host code using ap_int<br>datatypes but got errors and<br>hence had to use standard C<br>/C++ datatypes to resolve the<br>issue | Time to debug: 2.5hours approx.<br>Kernel: 30 mins. approx.<br>Received a few errors in mismatch between the depth of FIFO and the array size in<br>the kernel. Made changes and ran C/RTL co-simulation until it passed on HLS.<br>Host: 2 hours approx.<br>Spent some time in writing the host code. Very similar to the one used in OpenCL.<br>Got a few errors when used ap_int datatype. Hence reverted back to standard C<br>datatypes. The majority of time spent in debugging was waiting for SDx to build the<br>host and kernel code which took roughly a little more than 2 hours. |                     |
| Verilog                                  |                                                                             |                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                     |