Bit from the sidelines, but it seems that non-trig functions for which this PR should be irrelevant become quite a bit slower (like np.positive and np.conjugate, where the operation itself costs very little time). Would be good to understand why that is, because it seems avoidable -- if...