In this paper, we propose a deep architecture for learning semantic dependencies between tokens in a sentence. Instead of using fixed adjacency matrices initialized by dependency trees or prior knowledge, our graph module adopts an adaptive adjacency matrix to encode semantic dependencies. This matrix ...