Adium

Profile: Mark Muir (quatre)

Mail
Comment Count 4 comments
0 xtras

Latest comments

# by quatre on 01/17/06 at 04:06:58

The issue with the daggers († and ‡) is to do with Messenger's log formats. Some versions represent the end of each conversation by an invalid Unicode sequence, whereas more recent versions instead mark the sender line with a (different) invalid Unicode character in the middle of the "Display name says:". And some other versions do neither of these, and instead mark the beginning of the first line of conversation with a sequence of daggers. My parser can't determine beforehand which type it will encounter, so instead expects any. There may be other formats which I haven't looked at in a hex editor. Most of the time this works fine, but sometimes a mixup can occur, which leaves in some of those invalid Unicode characters, or extra daggers. I haven't got a fix for this - the Messenger file formats are just too idiotic to parse in an easy manner. This is also why the search feature in Messenger is so slow.

# by quatre on 01/17/06 at 03:39:15

OK, sorted it. The Adium logs don't use UTF-8 - they represent Unicode characters using HTML entities such as Λ (hopefully that will be escaped by this wiki). So my test logs now work in Adium. Hopefully it will work for you guys too.

Again, I'll get Fuyutsuki to upload it (now version 1.2).

# by quatre on 01/17/06 at 03:20:33

OK, I've now read some material on UCS-10646 and Unicode, and now I recognise that the encoding used in Messenger is UCS-2 (UTF-16), which I've now made the parser able to convert into UTF-8. After some tests with some Greek, I've validated that the conversion works, but unfortunately Adium's log viewer incorrectly interprets them as Mac Roman, like how TextEdit does if I open it with it (unless I explicitly tell it to open everything as UTF-8). I tried manually adding an HTML encoding line to the top of the log, which causes Safari to be able to show the characters properly, but Adium's log viewer ignores it.

I would appreciate it if someone who routinely uses Unicode characters in their Adium conversations could post me a snippet of one of their logs, so I can see how it identifies the encoding. Or maybe it doesn't, and it's just the current locale set in the OS that's causing the problem…

I'll get Fuyutsuki to upload this new version once he next comes online (version 1.1).

# by quatre on 01/16/06 at 12:18:23

Actually, it is because the MSN logs are a binary file format, and don't store the text as UTF-8. They use a format that I'm not familiar with (16 bits per character - I think this is wchar, but I don't know much about that encoding), so all my scripts do is basically ignore the most significant byte (except in a few special cases - those that I found in my own logs).

I can fix this (to properly translate to UTF-8) if anyone can point me in the direction of some suitable documentation.