Sunday, June 29, 2008

One hurdle down, many more to go

Just a quick update; after some re-reading of the MPEG2 spec, debugging, and clean up I've finally got correct output for the MC stage for progressive video clips that use frame based and field based motion prediction. There are two other motion prediction methods, 16x8 and dual-prime, but they don't seem to be too common and shouldn't be too hard to implement anyway. It took a bit of tweaking, but comparing the output to that of other media players I see no difference, which means one hurdle down. Next steps are to revisit IDCT and start working with real hardware.

Here are some screen grabs from various test clips:

Construction site on a field.
Windmill in the middle of a field of yellow flowers.
Coffee mug containing two pens and a feather.
Woman on the phone.

Thursday, June 26, 2008

Progress

I put some work into getting field-based prediction working, and I think I have it mostly right. I ran into what I think is a bug in SoftPipe, which has to do with locking and updating textures. For some reason the surface and texture cache does not get invalidated in such cases, leading to stale texels being read and displayed. I manually flush the texture cache after mapping textures, and that seems to take care of it. It took a lot of debugging to track that one down and is probably fixed upstream, but at least it's another issue out of the way. At the moment some macroblocks are still not rendered correctly, but I'm hoping to get those out of the way.

The one thing I really can't stand is writing shader code for Gallium. The amount of C code you need write to generate a token stream for even a simple shader is obscene. Currently I have 12 shaders and each is about 200-300 lines of code for 10-15 shader instructions, so most of that code is noise. On more than one occasion I've made changes to the wrong shader just because it's so hard to wade through the code. What I wouldn't do for a simple TGSI assembler right about now. I'll have to do something about that, it's a huge eye sore.

It's not surprising that I'm a little behind on my schedule. I started on IDCT a while back but put that code down to focus on MC. Luckily IDCT isn't strictly necessary as XvMC allows for MC-only acceleration, so I can test things and move forward on MC without having to worry about IDCT. I'm hoping the next step of the project, getting things running on real hardware, will be as painless as possible allowing me to get IDCT working. However, considering all the little unforseen issues that have cropped up with SoftPipe I wouldn't be surprised if I ran into more of the same with the Nouveau driver.

Monday, June 9, 2008

Moving along

Things are moving along in the right direction. I finally got a chance to push my work to date to Nouveau's mesa git, you can check it out here. I have I, P, and B macroblocks working correctly when rendering frame pictures and using frame-based motion compensation. All that's left is to implement is field-based motion compensation (which is surprisingly very common, even in progressive content), and rendering field-based pictures (i.e. interlaced content). I think I've figured out a way to efficiently render macroblocks that use field-based prediction in one pass. Frame-based prediction works by grabbing a macroblock from a previously rendered surface and adding a difference to form the new macroblock. Field-based prediction works the same way, but references two macroblocks on the previously rendered surface, one for even scanlines and other for odd. My plan is to read from both reference macroblocks every scanline and choose which one to keep based on whether or not the scanline is even or odd. This can easily be done with a lerp(). It would be preferable to avoid the unecessary texture read, but it's simple and works in a single pass. Other alternatives include rendering the macroblock twice (once with even scanlines only, then with odd scanlines, using texkill to discard alternating scanlines), and rendering even and odd scanlines using line lists (which I understand makes sub-optimal usage of various caches in the pixel pipeline).