dropout_add_ln_fwd doesn't seem to change the random state, which causes consecutive calls to use an identical dropout mask. I traced the problem to the counter offset being zero (https://github.com/HazyResearch/flash-attention/blob/main...
layer_norm::FwdParams ¶ms = launch_params.params; params.rows = rows; @@ -252,6 +247,11 @@ std::vector<at::Tensor> dropout_add_ln_fwd(const at::Tensor &x0, // Input: params.rowscale_const = rowscale_const; params.is_rms_norm = is_rms_norm;/...
+关注 MrSoloDolo 2018-11-2 15:06来自微博 weibo.com Lil Pump表示《Harvard Dropout》11月底或12月释出。LMrSoloDolo的秒拍视频 1729次播放 0:33 û 1 1 ñ8 o p 同时转发到我的微博 按热度 按时间 正在加载,请稍候...