One approach to start with:
Take the raw image data of an 8-bit monochrome and lop off the 2 least significant bits. That would give you a similar image (less dynamic range), but packed into 64-bits. If you viewed it with the right stride it would be recognizable. You could improve on this by remapping each of the 64 unique values so that higher values have more ‘ink’.
The extra 2-bits that you lopped off could be packed into 64-bit values in another plane. And to support RGB, convert to a YUV formal first. The Y is the monochrome and the UV would be encoded separately.
Or just encode the image file normally, but include a 64-bit, mapped monochrome in the header. You could even include newlines.