In one general aspect, a method can include translating words spoken by a user while in a scene into text, the scene capable of including a computer-generated virtual element. The method can also include parsing the text into a spoken command fragment. The method can further identifying a sce...