BitBlitter

Sunday, June 29, 2008

One hurdle down, many more to go

Just a quick update; after some re-reading of the MPEG2 spec, debugging, and clean up I've finally got correct output for the MC stage for progressive video clips that use frame based and field based motion prediction. There are two other motion prediction methods, 16x8 and dual-prime, but they don't seem to be too common and shouldn't be too hard to implement anyway. It took a bit of tweaking, but comparing the output to that of other media players I see no difference, which means one hurdle down. Next steps are to revisit IDCT and start working with real hardware.

Here are some screen grabs from various test clips:

Thursday, June 26, 2008

Progress

I put some work into getting field-based prediction working, and I think I have it mostly right. I ran into what I think is a bug in SoftPipe, which has to do with locking and updating textures. For some reason the surface and texture cache does not get invalidated in such cases, leading to stale texels being read and displayed. I manually flush the texture cache after mapping textures, and that seems to take care of it. It took a lot of debugging to track that one down and is probably fixed upstream, but at least it's another issue out of the way. At the moment some macroblocks are still not rendered correctly, but I'm hoping to get those out of the way.

The one thing I really can't stand is writing shader code for Gallium. The amount of C code you need write to generate a token stream for even a simple shader is obscene. Currently I have 12 shaders and each is about 200-300 lines of code for 10-15 shader instructions, so most of that code is noise. On more than one occasion I've made changes to the wrong shader just because it's so hard to wade through the code. What I wouldn't do for a simple TGSI assembler right about now. I'll have to do something about that, it's a huge eye sore.

It's not surprising that I'm a little behind on my schedule. I started on IDCT a while back but put that code down to focus on MC. Luckily IDCT isn't strictly necessary as XvMC allows for MC-only acceleration, so I can test things and move forward on MC without having to worry about IDCT. I'm hoping the next step of the project, getting things running on real hardware, will be as painless as possible allowing me to get IDCT working. However, considering all the little unforseen issues that have cropped up with SoftPipe I wouldn't be surprised if I ran into more of the same with the Nouveau driver.

Monday, June 9, 2008

Moving along

Things are moving along in the right direction. I finally got a chance to push my work to date to Nouveau's mesa git, you can check it out here. I have I, P, and B macroblocks working correctly when rendering frame pictures and using frame-based motion compensation. All that's left is to implement is field-based motion compensation (which is surprisingly very common, even in progressive content), and rendering field-based pictures (i.e. interlaced content). I think I've figured out a way to efficiently render macroblocks that use field-based prediction in one pass. Frame-based prediction works by grabbing a macroblock from a previously rendered surface and adding a difference to form the new macroblock. Field-based prediction works the same way, but references two macroblocks on the previously rendered surface, one for even scanlines and other for odd. My plan is to read from both reference macroblocks every scanline and choose which one to keep based on whether or not the scanline is even or odd. This can easily be done with a lerp(). It would be preferable to avoid the unecessary texture read, but it's simple and works in a single pass. Other alternatives include rendering the macroblock twice (once with even scanlines only, then with odd scanlines, using texkill to discard alternating scanlines), and rendering even and odd scanlines using line lists (which I understand makes sub-optimal usage of various caches in the pixel pipeline).

Monday, May 26, 2008

Something to show

I've put some more work into getting P and B frames rendering correctly and things are proceeding very well. Currently texturing from the reference frame works for P frames. All I have to do is add the differentials, which is a little tricky. The problem is that differentials are 9 bits, which means that in an A8L8 texture we get 8 bits in the L channel and 1 bit in the A channel. This shouldn't be too hard, just a bit of arithmetic in the pixel shader code. A more interesting problem is dealing with field-based surfaces, both when rendering and when using them in motion prediction. There's no straightforward way to render to even/odd scanlines on conventional hardware, so this will require some special attention. Currently I'm thinking I will have to render line lists instead of triangle lists when a macroblock uses field-based motion prediction and for rendering even/odd scanlines.

Here are some images from mpeg2play_accel, which I've been using as a test program:

Initial I-frame of the video.

Next frame, only P macroblocks using frame-based motion prediction are currently displayed, the rest are skipped.

Next frame, more macroblocks are rendered, and it looks mostly correct, except for the fine details. This is because the differentials are not taken into account yet.

Next frame, a few more unhandled macroblocks in this one.

Sunday, May 18, 2008

Intra-coded macroblocks? Check

After a few weeks of work I've made some good progress. Basic rendering of intra-coded macroblocks is working. What this means is that if you view a video you'll see the occasional full frame displayed correctly, and some macroblocks from the frames in between displayed correctly. Intra-coded macroblocks are the simplest to deal with, since they don't depend on motion compensation; all the data is present and you just have to render it. Every nth frame of an MPEG2 stream is composed entirely of intra-coded macroblocks. It's these frames that are currently being displayed correctly. Other frames are composed of some intra-coded macroblocks, but mostly inter-coded macroblocks. Inter-coded macroblocks depend on motion compensation and their samples are usually differentials. These I haven't gotten yet.

I've also cleaned things up a bit, added some error checking, and added some more tests. It's taken a lot of stepping through Gallium code to get things right, in leau of documentation, but thanks to GDB, and even more to Insight, I've gotten this far. Stephane has answered my questions, mostly on how to efficiently do things, and even the folks in #mplayerdev have been helpful on XvMC and general decoding matters, so all in all I would say things are going smoothly.

One thing I'm sure of is that no one reads this thing currently. The X.Org folks have asked that their students keep a blog and also submit it to planet.freedesktop.org but I was told it only accepts RSS feeds. Currently this entire web site is maintained using a text editor, so I'll have to work out something more sophisticated in the near future. :-/ (Update: Since then I've been using BlogSpot and you're probably reading this post there instead of the old page.)

Thursday, May 1, 2008

Up and running

Today I managed to get the basic color conversion step up and running using SoftPipe. Most of the difficulty came in understanding Gallium more than implementing the color conversion stuff. I spent many hours trying to figure out why I couldn't get any geometry to show up in my window. Copying surfaces to the frame buffer worked fine, but rendering a triangle left me staring at a black screen. It turns out that you have to set the pipe_blend_state.colormask bits for the channels you want to write to. First, I didn't even consider that state because I disabled blending. Second, setting the mask to allow writes was the opposite of what I would assume. It took several hours of stepping through Gallium to find that everything was OK until we got to the fragment shader, where it skipped the frame buffer write back.

Other issues included getting a handle on writing TGSI shader code and figuring out how to get Gallium and XvMC APIs to agree. At the moment generating TGSI isn't a pretty process, you can look in gallium/auxiliary/util/u_simple_shaders.c for an example. As for Gallium and XvMC agreeing, most of the problem came from the fact that XvMC functions all accept a Display*. What do you do if the client creates an XvMC context with one Display*, creates a surface with another Display*, and so on? Well, hopefully no one will do that, but one has to wonder why it's even allowed. Then there's the issue of some calls only taking an XvMCSurface*, and not the associated context. Unfortunately the context is where I keep the Gallium pipe context, so every surface has to have a reference to the context it was created with. Luckily this works out since some functions that do take a surface and context require that we check that they match, so at least it makes that simple.

Friday, April 25, 2008

Accepted, digging through code

After a long interim period I now know that the proposal has been accepted. Rather than sit around I've been working on getting things up and running, so I'm glad I got a head start on things. It took a lot of digging through Mesa code and some questions to the dri-devel mailing list and IRC channel, but I've managed to get some basic initialization out of the way. I've also implemented enough functionality and stubs to get some basic test cases compiling and running successfully. I found a port of mpeg2play on Mark Vojkovich's web site that uses XvMC and have managed to get that compiling and running. By running I mean not crashing, it doesn't display anything as of yet, but at least I'm heading in the right direction.

Now I'll need to figure out how to get XvMC surfaces onto X drawables with Gallium3D. For some reason most of the XvMC functions don't take the XvMCContext as an argument, so I have to store that along with each surface, and yet they all take a pointer to Display, which I don't see a use for. A headache more than anything else, but it seems counter-intuitive to me. Also, the Gallium3D API is new to me and it will take some time to figure out. Keith Whitwell provided me with an in depth explanation of how to start on the state tracker and winsys thankfully. I'm hoping by the end of this weekend I'll have something on screen, even if it's garbage (i.e. the video frames before IDCT). I'm also hoping to get started on writing shader code to do the color conversion. Stephane Marchesin, my mentor for this project was kind enough to point me to the current Xv implementation for the Nouveau driver, which does color conversion and bicubic interpolation in shaders currently.