Characteristics of AI-generated images - is forensic analysis possible?

Not sure if this verges into Cafe Society territory, but looking for some factual-ish answers, so here goes -

A friend came across a maybe slightly risqué image of a young woman with blue hair wearing a tank top. To me, and to a number of other people I’ve asked, this is clearly an AI generated image à la Midjourney or one of the other artbots, while she was leaning more toward ‘Insta filters’ or similar manipulation of an image of an actual person. There were quite a few things that stood out for me as characteristic of an AI image, divided roughly into ‘anatomy’ and ‘visual effects’ (spoilers at end). I have the sense that AI systems are bad at anatomy other than faces, and that they stitch backgrounds together with much less fidelity than foregrounds, but I’d be hard pressed to quantify that other than saying ‘that looks off’.

There’s an IMHO side to this question, in that I’m curious as to whether the things I noticed line up with other people’s perceptions, or if others notice things that I didn’t, but I’m mainly interested in the forensic side of things - if an image like this was presented at trial as evidence of something, are there actual, quantitative diagnostic tests that can be applied to such images to ‘prove’ or ‘suggest’ that they are AI-generated from a million training images as opposed to manipulated/filtered images or ‘raw’ images? Or does it come down to a visual artist saying ‘my training and experience leads me to conclude that this is a (whatever) image because of the pixels’? I would suspect that any conversation we have about this now will be obsolete in a year, much less five or ten years, but what’s going on today?

FWIW, the things that I noticed about the linked image:

In no particular order-

Anatomy: Two right clavicles. Possible traces of necklace/pendant at sternal notch. Double right earlobe. Neck thinner than the interpupillary distance. Cleavage oddly rendered. Double row of eyelashes on the right eye. Pupils not aligned, and different light reflections between the two eyes. Odd morph of the right lower eyelid. Odd skin folds in the armpit.

Other visual effects: ‘Painterly’ hair, with strands emerging from/going to nowhere. Choker and necklace look like solid object/tattoo hybrids. Trim along ceiling doesn’t line up from left to right. Tank top strap gets weird as it goes over the shoulder. Varying focus for things that should be in the same plane.

Nevermind, I fixed it.

Watermarking is the most obvious way.

But you’re assuming the creator never wants to be found out and also the investigators don’t have access to the potential generating model?

Yes, I assume that the police or whomever are just working with the image itself. There’s no metadata conveniently attached. I don’t know if the police or NSA have access to the servers at Midjourney or DALL-E to reverse-engineer stuff.

I know there’s a GPT3 vs not-GPT3 classification model that works on distribution of complexity of the text. Machine generated text is fairly uniformly complex while people write simply followed by complexly followed by simply, etc. I imagine you could do something similar with images.

These aren’t necessarily exclusive as you can upload a photo into Midjourney to use as a starting point. In fact, I’m fairly confident that was the case here (just based on knowing some MJ users who well this a lot and seeing their results).

True, and that would add a layer of complexity - can you separate the elements derived from a seed image from the elements added by an AI (or even a filter app).

Nosing around a bit on the internet, I found multiple bits of software that claim deepfake detection but almost all are a matter of “looking at the pixels” rather than using “physical” evidence such as body proportions. The one outlier is by Intel where the software looks for minute shifts in color caused by blood flow but that is for video.

The arms race begins. Soon a deepfake AI will boast realistic blood flow color shifting.

At last…a question for me. I provide video forensic services and testify as an expert witness, mainly in civil cases.

And the answer is…it depends. We have some tools that help, but digital files are digital files. If you provide a document examiner with a bad fax, they’ll usually explain that they can’t tell you a lot. If someone edits a video carefully and transcodes it, I might be SOL re providing an opinion.

Which is why provenance is such an important issue. Sometimes my report is simply, “This digital video file is (or isn’t) consistent with the recording device which was purportedly used to create it.” IANAL, but I’ve been involved in enough cases to know that an attorney can’t simply hold up a picture or play a video in court and say, “See for yourself!” (OK, I guess they can, but it wouldn’t be very helpful to their argument.)

Checking to see if I understand this - your report would be limited to something like 'the defendant said that he took a photo of the killer with his IPhone 22, but the image in evidence could not have been shot on an IPhone 22 because (x,y,z)? Not so much commenting on the face in the photo but on some technical quality of the image itself?

It often is just that. During the discovery phase, a video file might be provided to opposing counsel and we find that it is not an exact copy of the original file ('best evidence"). The aspect ratio, resolution, frame speed, and/or other characteristics may have been intentionally or unintentionally altered. The plaintiff, for example, may submit a video that purports to show the speed of a vehicle prior to collision. We may examine the video and conclude that the original variable frame rate MOV video file was converted to a fixed frame rate AVI and that an unidentified resampling process was used.

Obviously, we’re not limited to this subject when forming opinions, but the work can often be that simple.

And you’d be surprised how often we get videos that are actually shot by a state trooper/investigator aiming their cell phone at a video monitor in a gas station. Many owners of video surveillance systems are either unwilling to provide a digital copy of the video or do not know how to do so. It’s challenging to help estimate vehicle speed when you have a 60 fps handheld video of a 15 fps video clip shown on a 75 Hz monitor.

One weirdness I noted is that the pendant has no depth to it, as if it’s a tattoo.

Of course, it could actually be a tattoo.

My vote is AI generation.

First thing I noticed. (Ok, second thing). Looks like a round diamond in a round setting, placed exactly at the top of a tattoo.

There are a few other oddities I noticed, along with the list provided above. Why are the strings - I assume bra straps - why are there a pair of them, why are they loose? Odd, but not impossible. the upper arm tattoos, from what we can see, are not balanced - the hidden one on our right seems much higher. I’m not into tattoos, is that normal? Why does she have smile line creases on the right, but not the left, without an imbalance of the mouth shape? Odd that with someone with big tattoos, we see no sign of an earring (or many) on the exposed bit of earlobe? Why no indent where the shoulder strap of the tank top goes over the shoulder - is it not supporting anything? (The bra strings sure aren’t). Are those things super perky or super weightless - neither seems plausible. The shoulder strap is not bent or twisted, too perfect; and no wrinkles on the tank top at all - there’s a hint of ribbing of the tank top material, when I expand the photo - but I look very closely and the ribbing shows no indication as far as I can see of stretching, suggesting it was custom tailored for her size(s) exactly. The dark roots of the hair, but quite a large uncoloured area by the front of the part…

None of this is definitive, but added to the list from the previous post, points to AI generation as the more plausible explanation.

I assumed that was fashion :smiley:
Her left iris is an oval. The right one doesn’t look correct either but the left is worse

So I assume this is the sort of answer ZonexandScout was describing. There are so many anomalies that it is very very likely a generated image - but we cannot be 100% sure, and how and with what software and how much it is based on one single photo, we cannot say definitively without some source comparison or provenance.

Good summary.

When evidence is offered, a foundation for it must be laid. Video or graphic material, like all evidence, must be a fair and truthful representation. Provenance is an important aspect of laying the foundation.

However, this hardly ever come up as an issue during the trial itself. More typically, the plaintiff will offer a video that they intend to submit as evidence. The defendant may use an expert witness to provide a report that states (among other things), “This video has been transcoded and has compression artifacts not in the original. The aspect ratio in this video is not consistent with the camera used to record it. The frame rate is not consistent with an original video made using this camera. The images have been enhanced using an unknown method to smooth the image and alter the pixels. It is not a reliable representation of the original.” This will go back and forth for some time and may actually reach a pre-trial hearing as to admissibility.

Clearly what is need is an AI that can tell real photos from created ones. I’ll bet its on the way.

Unfortunately an AI that’s really good at picking out fakes is exactly what you need to train the original AI to make even more undetectable fakes.

That’s been around since the 90s. Of course, back then it was more about telling when an image was Photoshopped than generated from whole pixel cloth.

I’d guess such software relies on a proprietary server and you pay for access rather than a localized version to be dissected for science. In any event, we already know what to look for and the real barrier is just making image AI that doesn’t add three extra spider-fingers or extra clavicles or make weird blurry bits in the window shades behind the subject. We’ll get there sooner or later and I suspect more sooner based on how much the tech has advanced in even the last 12-18 months. After that, it’ll really down to “seeing the pixels”.

Edit: Also issues with complexity of the image. Something that looks like an Instagram selfie in a bedroom is a lot easier to make right now than someone specific stabbing someone else specific with a pair of scissors.