和 的获取更直接:对每个不同的输入元素(单词)都对应一个Query向量,一个Key向量,一个Value向量。这三个向量都是输入单词的embedding向量乘以投影矩阵 , , 得到的[5]。 Multiplying x1 by the WQ weight matrix produces q1, the "query" vector associated with that word. We end up creating a "query", a...
An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. 这么说有点抽象,我们用搜索引擎(信息检索)做个类比: 当你用上网查东西时,你会...