I used John and ImageMagick to do the benchmark with the Intel OCL sdk installed.
The results (John) --
mscash2-opencl --
OpenCL platform 0: Intel(R) OpenCL, 1 device(s).
Using device 0: Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz
Optimal Work Group Size:32
Kernel Execution Speed (Higher is better):0.000444
Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
Raw: 418 c/s real, 105 c/s virtual
mscash2 --
Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [128/128 AVX intrinsics 4x]... (4xOMP) DONE
Raw: 2464 c/s real, 620 c/s virtual
wpapsk-opencl --
OpenCL platform 0: Intel(R) OpenCL, 1 device(s).
Using device 0: Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz
Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... (4xOMP) DONE
Raw: 790 c/s real, 199 c/s virtual
wpapsk --
Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [32/64]... (4xOMP) DONE
Raw: 924 c/s real, 232 c/s virtual
bf-opencl
OpenCL platform 0: Intel(R) OpenCL, 1 device(s).
Using device 0: Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz
****Please see 'opencl_bf_std.h' for device specific optimizations****
Benchmarking: OpenBSD Blowfish (x32) [OpenCL]... DONE
Raw: 1131 c/s real, 381 c/s virtual
bf
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... (4xOMP) DONE
Raw: 2448 c/s real, 613 c/s virtual
With ImageMagick convolve filter --
With OpenCL --
convert -convolve -1,0,-1,0,4,0,-1,0,-1 background.png test.png
real 0m5.175s
user 0m5.776s
sys 0m0.064s
Without OpenCL (OpenCL excluded during compile) --
convert -convolve -1,0,-1,0,4,0,-1,0,-1 background.png test.png
real 0m4.985s
user 0m5.560s
sys 0m0.052s
As we can see, the Intel SDK is of no good. Maybe they could've spend more time on developing their hardware and drivers instead of making shiny GUI on .NET.