It depends on the model but I’ve seen image generators range from 8.6 wH per image to over 100 wH per image. Parameter count and quantization make a huge difference there. Regardless, even at 10 wH per image that’s not nothing, especially given that most ML image generation workflows involve batch generation of 9 or 10 images. It’s several orders of magnitude less energy intensive than training and fine tuning, but it is not nothing by any means.
Yeah you’d really only say it on the theoretical side of things, I’ve definitely heard it in research and academia but even then people usually point to the particulars of their work first