However, these above methods modeled in either a discrete or a continuous action space, which restricted the optimization of offloading decisions in limited action space. In reality, the action space of offloading problem is generally continuous-discrete hybrid [22]. The agent should decide continuous...
The algorithm tunes the weight (temperature) of the entropy factor adaptive to the received rewards: less reward means more exploration is needed, while higher observed rewards in the long run cause reduced entropy. Figure 2. The SAC reinforcement learning framework [50]. The black arrows ...
Reward shaping [22] is a means of manually tuning and modifying fine-grained reward signal values for robots in different states. The tuning that is performed in reward shaping is intuitive and highly dependent on the expert experience of the person conducting the process. Inappropriate rewards ...
They proposed a methodology that combines a fuzzy C-means method for assigning camera location points to each UAV with a route optimization algorithm for calculating the visit order of the camera location points for each UAV by solving the Multiple Traveling Salesmen Problem (MTSP). Alhindi et ...