Grape is considered as a cash-crop throughout the world. As compared to other fruits, shape of every grape cluster is different from each other. The change in region of grape cluster with respect to image size is sparse in nature and hence involves lot of errors. So it's a bit challenging to find shape and estimate weight of grape cluster using modern algorithms as well. In this paper, we proposed a deep learning regression model with combination of basic structures of U-net, VGG-16 and attention modules. The sequence combinations of layers such as convolution layers, max-pooling layers and average pooling layers along with concatenation operations are the main characteristics of these models. This model is capable of predicting weight of grape clusters present in images with a reduced error. © 2022 IEEE.