and then the robot system begins to localize the person's position on the scene by a deep neural network, after the user had clicked the start button. Once the user is in the scene, many ROI candidates can be found in the mobile phone's scene, through the remote desktop on the robot...