This is probably a very difficult challenge (for anything/anyone), unless someone has already reverse engineered it (did you check Github?). Otherwise, you might have to figure out the binary encoding first — there are people who do that for fun or for a living, reverse engineering programs and their file formats.
I am fairly confident LLMs can’t do this yet. Binary encodings are not natively “readable” by LLMs, so would have to go through some sort of parsing/decoding first.
It’s not something you could just throw money at unless (maybe) you had a sufficiently large training dataset of them (tens of thousands? maybe more?) to train a new LLM as though the binary were a language of its own, i.e., training it to understand that binary encoding as just another language. That might be doable if the binary were a simple bit-for-bit representation of the underlying data, as opposed to some sort of compression or encoding scheme. It would not be easy.
This should be more doable. Can you post some samples (both the files themselves, and the layout they are supposed to represent)? Or DM me with them?
LLMs might be able to do them either out of the box (if it’s a format they’ve seen before or can relatively easily reason about). Otherwise, with enough sample inputs & outputs, it can at least help write converters and tests and then iterate through them (with agentic workflows) until it gets them right and the outputs can pass some validation testing (that it can also help write). It might take a lot of prompting, trial and error, and a lot of notes for the LLM though (and probably not through a simple chatbot interface, but something more like Claude Code or another agentic workflow).
Anyway, feel free to post/send some of those if that’s a possibility and I can look into it for you in ChatGPT, Claude, and Gemini.
Or you can try it yourself with coding agents (like Claude Code, OpenAI Codex, Google Antigravity, Cursor, Windsurf, etc.). Those are good for more complex coding tasks like this if the basic chatbots don’t work well enough.
Philosophically, computer programming is something that the LLMs excel at, both because they’re just a form of language translation, and also because they have immediately verifiable outputs (in a way that medical, legal, and most other questions don’t).
They still do frequently hallucinate while coding, but the agentic workflows are not just pure LLMs. They can call other tools, write tests, run those tests, etc., again and again until the output actually verifiably compiles/builds and actually works. It’s less of a Q&A type chat interface and more of an orchestration system for agents and sub-agents that prompt each other, test their outputs, repeat until it works, etc.
So it’s not really the same problem domain as in most other fields. Here you’re not asking them to find truthiness, merely build & rebuild code until it becomes verifiably correct/functional. Programming is probably the single most powerful use for LLMs right now, which is why there are so many companies and models for agentic coding all competing each other.
As far as models go, I think Claude and Gemini are currently in the lead, but that changes every few weeks — there are numerous benchmarks available you can look up if you really care. But your prompting and context notes will make a big difference in something like this, maybe even more so than raw model performance. Even if nothing can “one-shot” that particular format, you can probably get a working converter written with a few hours of effort (mostly waiting + occasional prompting).