+ drop_caption_rate (float, optional): Rate of dropping caption, + used for training. Defaults to 0.0. + phase (str, optional): Subdataset used for certain phase, can be set + to `train`, `test` and `val`. Defaults to 'train'. + year (int, optional): Version of CoCo dataset,...