(1) Specially designed multi-head probsparse self-attention mechanism can effectively highlight the dominant attention, which makes the TFT have considerable performance in reducing the computational complexity of extremely long time-series; (2) The TFT trained by knowledge-induced distillation strategy ...