Aiming at the problem of low accuracy in monocular vision image depth estimation, a monocular image depth estimation method based on Transformer and convolutional neural network is proposed. First, ResNet-50 is used as the backbone network of the encoder-decoder network to extract image fea...