The mean of G(τ) from trajectories generated therefore can serve as a practical, unbiased estimator of J(π). The policy-performance metric J and the trajectory-performance metric G go by many names in the various fields that address the sequential decision-making problem. Fig. 1 shows some...