Can you reliably tell if an image is AI generated? I'm starting to doubt it

I’ve been running an informal study for the past month using a set of 40 images, a mix of AI-generated and human-created, across different styles and subjects. I ran this set through four different AI image detection tools and then also asked a group of colleagues (academics, not imaging specialists) to make their best guess with no tool assistance.

The results were humbling.

The detection tools disagreed with each other on 31% of the images. That’s not a small number. On those contested images, the tools’ confidence scores also varied significantly. One tool flagging an image as 94% likely AI-generated while another returns 38% on the same image is not useful information. It’s noise that looks like signal.

My colleagues, the human group, performed about at chance on photographic images but noticeably better on illustrations and digital art, where stylistic cues were apparently more recognizable. The tools showed the inverse pattern, performing somewhat better on photographs than on stylized images.

What this suggests to me is that neither human judgment nor current detection tools are reliable enough to use as sole arbiters of authenticity. They should probably be used together, and even then with significant caution about what claims you’re willing to make based on the output.

The implications for publishing and academic contexts are real. If you’re using an image detection tool to make decisions about submitted work, you should understand what the tool actually tells you and what it doesn’t. A detection result is a probability estimate, not a determination.

Has anyone else been running systematic comparisons? Curious whether the disagreement rates I’m seeing hold up across different datasets.