The advances in computer vision methods in agriculture continue to gain attention due to safe non-invasive approach to estimate plant well-being in agricultural food production environments. This paper reviews some of the latest advances and methods in this field, particularly when estimating various plant characteristics and overall plant well-being from an image or set of plant images on the IoT Edge devices. The Vision Transformers (ViT) and Convolutional Neural Network (CNN) models currently are widely adopted and compared in this field. The most recent of the two types of model architectures - ViT is being researched in the context of IoT networks and its potential for deployment on Edge is being evaluated. However, more research is needed in this field to overcome the transformer model limitations and drawbacks. The current research shows that for transformer model deployment on the Edge there exists some optimization and compression techniques that would allow it to be adopted more widely across different computer vision tasks in agriculture on Edge. The hybrid model approach is also emerging where the advantages of both CNN and ViT are combined to reach higher accuracy and better performance.