the scene capable of including a computer-generated virtual element. The method can also include parsing the text into a spoken command fragment. The method can further identifying a scene definition gesture from gesture information captured for the user in the scene. The ...