During a simple educational reimpl of CTC I found that torch.logsumexp produces nan gradient if all inputs happen to be -inf (it can also produce inf output, but it's not a problem). Zero gradient is much better in this case (since zero accumulates fine with other non-nan gradients)...
dim())) masked_tensor = replace_masked_values(tensor, mask, 0.0) # total value total_tensor = torch.sum(masked_tensor, dim) # count count_tensor = torch.sum((mask != 0), dim) # set zero count to 1 to avoid nans zero_count_mask = (count_tensor == 0) count_plus_zeros = (...
masked_tensor =replace_masked_values(tensor, mask,0.0)# total valuetotal_tensor = torch.sum(masked_tensor, dim)# countcount_tensor = torch.sum((mask !=0), dim)# set zero count to 1 to avoid nanszero_count_mask = (count_tensor ==0) count_plus_zeros = (count_tensor + zero_count_...