Thanks for this great deepspeed feature. I am also running into the same error both for DistilBertForSequenceClassification' object has no attribute 'backward' and for BertForSequenceClassification object has no
RuntimeError: Failed to import transformers.models.flaubert.modeling_flaubert because of the following error (look up to see its traceback): module 'signal' has no attribute 'SIGKILL' Expected behavior flawless import as usual