Now that the ATSC scanning support is basically working in Kaffeine4, it made sense to start looking at the proper representation of the "Extended Channel name". This is where you see "My Fox Simulcast" instead of "WWOR-DT1". A major component of this is supporting what are known as ATSC MultipleStringStructures. These are used for other functions as well, such as the program guide, so it's generally useful work.
The MultipleStringStructures supports several formats for encoding, including UTF-16. Great. Unfortunately most broadcasters, at least those in the New York area, decided to go with an obscure English-specific encoding that uses Huffman compression. This is especially annoying because that means I have reproduce both Huffman tables in my code and they've got 1099 values each.
I was really hoping that nobody would actually be using the Huffman compression in a 20Mb/s data stream, but I should have known better.
I've got everything working except the compression (so it works for broadcasters who aren't sending the values compressed). I'll get the rest working tomorrow...
Interesting. I just loaded up Elgato EyeTV, and at first glance it looks like they're only decoding the extended channel name for those channels that aren't using the compression. I guess they never included the Huffman tables themselves (an interesting thing to note for a commercial product). I wonder if that means the program guide typically doesn't use the Huffman tables either (or else they would have been forced to include them).
Update: It looks like MythTV already has the code (it's GPL'd and even returns a QString!)
http://www.cuymedia.net/mythtv-0.21/atsc__huffman_8h-source.html