详细用法 第一次运行需要一些时间来编译 CUDA 扩展。 train (head) by default, we load data from disk on the fly. we can also preload all data to CPU/GPU for faster training, but this is very memory-hungry for large datasets. --preload 0: load from disk (default, slower). ...