i guess it's not strictly relevant to the first example, which has no interesting non_trainables, but the second example has batch norm and dropout. additionally i don't see where this non_trainable state would be managed? re: batchnorm; each replica might have the same params, but is ...