in partially observable tasks, these reward events can occur with no context for some of the agents. The advantage of the joint reward is a salient signal across all that can be learned from, as well as additional information about the performance of team members that may or may not be obs...
in partially observable tasks, these reward events can occur with no context for some of the agents. The advantage of the joint reward is a salient signal across all that can be learned from, as well as additional information about the performance of team members that may or may not be obs...