Since I've basically conceded that I am going to have to run at a 30 KHz i2c bus for the 950q, I started looking at where the rest of the time was being spent.
I started a new tree where I am putting in some incremental improvements, such as removing the unneeded udelay() calls, changing the xc5000 reset strobe to 10ms, etc. And I started profiling how the i2c bus is actually loaded with data.
Part of the problem is I have to program the au0828 registers for each byte to be sent. As a result, for a four byte sequence, I do one USB control command for each byte, one control command to strobe the bus, and then a series of read commands to poll for status. This results in huge gaps in the loading sequence, since the code does not continue to load new bytes in while it is strobing the bytes currently into the buffer onto the i2c bus.
(notice the 661us gap between sequences of four bytes)
I started to code up a prototype where as the polling for status was in progress, I would see how many bytes had been clocked out (stored in the top three bits of the status register), and writing additional bytes as the FIFO emptied. It appears to work, and I can see larger concurrent sequences appearing in the analyzer output (with the correct data). However, the intergroup gap appears to increase almost proportionately, thereby eliminating the benefit. There also appears to be a bug somewhere in my code that causes the loading sequence to abort about half way through.
I will probably give it one more night of experimentation before I give up. I've gone *way* off the supported path since this method isn't in the current driver or the Windows driver from what I can see in the USB trace.