For example, the `Long Scoreboard` stall reasons might be an indicator to look at the `uncoalesced global accesses` reported in the `Source Counters` section. In fact, in a real-world scenario, our optimization goal would be a runtime of less than 1 ms on the target hardware. However,...