It’s possible that yours just isn’t very visual. I mean, people who are congenitally blind and deaf think things up all the time – it’s not like everything has to be a symphony of all the senses. If you can run music like a tape in your head, you’re probably doing okay. 
I don’t know if my explanation of it is very typical. I have this quasi-eidetic thing going, and I am informed on a pretty regular basis that this is “weird”, or “cool”, or “frigging scary”. Other eidetickers inform me that I’m odd because I don’t close my eyes to do it either – I can’t do remembered snapshots with my eyes closed, in fact, and I prefer not to do it with user-generated visualizations. It seems to short-circuit something that needs to be on for me to pseudo-see stuff. A lot of people find having external input gets in the way of rummaging around in their own brain for something.
The way it goes for me is that I get the image by doing something akin to ‘miming’ the experience of seeing it. You can pretend to write by pinching your fingers together as if you’re holding a pencil, and moving your hand in the same way you would use to mark letters on the page if either the pencil or the paper were real. You’re missing literally everything you physically need in order to write, save your hand, but you can remember what it is and where it goes and the process of using it, so you can pretend without having all the props.
The best I can explain it is that, to see eidetic snapshots, I do the same thing I did the last time I was looking at it. My eyes move as I focus on different parts of the scene, I bring up context like how I was feeling or what I was hearing or what I’d been doing right before. I also have a good sense of how things are embedded in space, and that helps a lot. It recreates the circumstances, and thereby recreates the ‘perception’ of the thing. I don’t know if it would be strictly accurate to say that I am literally seeing a red car out in front of me – that would be a hallucination, and much more worrying.
I think I’m recreating the circumstances under which the neurons that fire when I see a red car are apt to fire again.
With imaginary things, what I do is use my experience to construct that contextual mockup of what it would have been like to see it in the first place, and that helps me build up the visualization in my head. Otherwise it’s the same as above.
As an aside, I don’t know how accurate it is, but I’ve heard it said that you can identify people who can pull up full or partial eidetic snapshots by asking them to describe their memory of a static scene. People who use the past tense have the scene stored as a description of past experience. People who use the present tense are ‘seeing’ what you asked them to describe at the moment they’re describing it. They’re both valid answers, but gotten through different methods.