Okay, this is a rather technical and odd question, but bear with me.
I’m not going to name and shame the programmer, but I’m using a program to keep track of a whole lot of data about genealogy and DNA. I download a big file from a website from time to time and upload it into this program, and only some of the data will be new, the rest is duplicates that should just be merged or ignored.
At some point this program stopped importing data containing non-latin characters correctly. Bøb Båbcatcher becomes B√∏b B√•bcatcher, with all the issues that involves*. The programmer blames the platform, and I’m sure it’s possible they are to blame, but according to the XOJO webpage they use UTF-8 internally if nothing else is specified, and my text editor claims this is a UTF-8 file and displays it correctly as such.
I’m hoping there’s just one step of incorrect handling of the encoding and changing the encoding of the data to import is a simple thing, but after trying a handful of options I still haven’t found the correct encoding.
So my actual question is: Is there a way to look at the resulting scrambled characters and figure out the encoding the program is using during import?
I mean a UTF-8 ø is read, something happens, probably a single encoding mismatch, and UTF-8 √∏ is displayed.
Experimental use of other encodings give different scramblings, so I’m sure Alan Turing could work it out, but I’m hoping someone here can do it quicker than I can create my own Bletchley Park.
*It looks ugly, it prevents searches on names containing these characters, it makes it impossible for the program to merge duplicates when mass importing data, etc.