If you are walking B' over A (e.g. filter or pattern search), then you can parallelize the tile walk. IOW parallel-outer - vector inner. If you can, consider making the dimensions multiples of your target platf
Here is the code I want to parallelize. Obviously, the "p->execute" call can be prefixed with a spawn, and before the "local_execute" there has to be a sync. However, how to you prevent a task from getting multiply executed? If several threads hit the first conditional and think it...
If you desire to .NOT. take the effort to parallelize your code, then you need to do whatever you can to improve the vectorization. This may require you to look at your data layout, but may be as simple as using the appropriate compiler switches, and possibly adding a few compiler direc...
But execution of a particular function / computation in Cilk Plus may switch threads at strand boundaries. Both Windows and Unix-flavor Cilk Plus runtimes have similar behavior, so you can usually think of them as the same.The warning about the use of thread-specific data is there in large ...
As to which is faster (your auto-gen code or my specific code), well that can be tested (by one that has both codes). Assuming x is unknown at compile time, it is not clear to me as to how you could parallelize this. This said, one (you) could have the compiler identify this ...