Monday, July 21, 2008

Up and running on real hardware

I reached a nice milestone today: working playback on my Geforce 6200. Most of the work went into the winsys layer, with some bug fixes and workarounds in other places, but everything is up and running now. Unfortunately the output isn't perfect, there is some slight corruption here and there. I'm guessing it has to do with some dodgy assumptions I made about shader arithmetic (rounding, saturation, etc) that SoftPipe went along with but the GPU didn't. The other issue is that there is some severe slowdown when any 2D drawing happens on the rest of the desktop. I'm guessing this may be due locking when copying the backbuffer to the window, or maybe I'm completely soaking up the CPU.

Currently nothing is optimized, I'm not even turning on compiler optimization, and I have a really slow prototype IDCT implementation performed on the CPU in place of the hardware version, so I'm sure I'm eating up a lot more CPU time than I will be by the end of the summer. I have a lot of different ideas on optimization that will target CPU usage and GPU fillrate usage, but given that I get almost full speed playback currently, I'm pretty confident that I'll be able to get HD playback by the end of SoC.

As far as the winsys goes, I was able to use most of the current Nouveau winsys. Unfortunately the DRI stuff is buried within Mesa right now, so I had to extract a lot of things and create a standalone library to handle screens, drawables, the SAREA, etc. to be able to use DRI without including and linking with half of Mesa. The winsys interface is also simpler than Mesa's; there are only a few client calls, the backbuffer is handled in the state tracker, and the winsys doesn't have to create or call into the state tracker. It took me a while to realize why the Mesa winsys was set up the way it was, and that I could simpify things on my end.

Here are some screen grabs:

Coffee mug containing two pens and a feather. Woman on the phone. Windmill in the middle of a field of yellow flowers.
Two cartoon characters (desktop visible). Fighter jet flying through a clear sky (desktop visible).

Sunday, June 29, 2008

One hurdle down, many more to go

Just a quick update; after some re-reading of the MPEG2 spec, debugging, and clean up I've finally got correct output for the MC stage for progressive video clips that use frame based and field based motion prediction. There are two other motion prediction methods, 16x8 and dual-prime, but they don't seem to be too common and shouldn't be too hard to implement anyway. It took a bit of tweaking, but comparing the output to that of other media players I see no difference, which means one hurdle down. Next steps are to revisit IDCT and start working with real hardware.

Here are some screen grabs from various test clips:

Construction site on a field.
Windmill in the middle of a field of yellow flowers.
Coffee mug containing two pens and a feather.
Woman on the phone.

Thursday, June 26, 2008

Progress

I put some work into getting field-based prediction working, and I think I have it mostly right. I ran into what I think is a bug in SoftPipe, which has to do with locking and updating textures. For some reason the surface and texture cache does not get invalidated in such cases, leading to stale texels being read and displayed. I manually flush the texture cache after mapping textures, and that seems to take care of it. It took a lot of debugging to track that one down and is probably fixed upstream, but at least it's another issue out of the way. At the moment some macroblocks are still not rendered correctly, but I'm hoping to get those out of the way.

The one thing I really can't stand is writing shader code for Gallium. The amount of C code you need write to generate a token stream for even a simple shader is obscene. Currently I have 12 shaders and each is about 200-300 lines of code for 10-15 shader instructions, so most of that code is noise. On more than one occasion I've made changes to the wrong shader just because it's so hard to wade through the code. What I wouldn't do for a simple TGSI assembler right about now. I'll have to do something about that, it's a huge eye sore.

It's not surprising that I'm a little behind on my schedule. I started on IDCT a while back but put that code down to focus on MC. Luckily IDCT isn't strictly necessary as XvMC allows for MC-only acceleration, so I can test things and move forward on MC without having to worry about IDCT. I'm hoping the next step of the project, getting things running on real hardware, will be as painless as possible allowing me to get IDCT working. However, considering all the little unforseen issues that have cropped up with SoftPipe I wouldn't be surprised if I ran into more of the same with the Nouveau driver.

Monday, June 9, 2008

Moving along

Things are moving along in the right direction. I finally got a chance to push my work to date to Nouveau's mesa git, you can check it out here. I have I, P, and B macroblocks working correctly when rendering frame pictures and using frame-based motion compensation. All that's left is to implement is field-based motion compensation (which is surprisingly very common, even in progressive content), and rendering field-based pictures (i.e. interlaced content). I think I've figured out a way to efficiently render macroblocks that use field-based prediction in one pass. Frame-based prediction works by grabbing a macroblock from a previously rendered surface and adding a difference to form the new macroblock. Field-based prediction works the same way, but references two macroblocks on the previously rendered surface, one for even scanlines and other for odd. My plan is to read from both reference macroblocks every scanline and choose which one to keep based on whether or not the scanline is even or odd. This can easily be done with a lerp(). It would be preferable to avoid the unecessary texture read, but it's simple and works in a single pass. Other alternatives include rendering the macroblock twice (once with even scanlines only, then with odd scanlines, using texkill to discard alternating scanlines), and rendering even and odd scanlines using line lists (which I understand makes sub-optimal usage of various caches in the pixel pipeline).

Monday, May 26, 2008

Something to show

I've put some more work into getting P and B frames rendering correctly and things are proceeding very well. Currently texturing from the reference frame works for P frames. All I have to do is add the differentials, which is a little tricky. The problem is that differentials are 9 bits, which means that in an A8L8 texture we get 8 bits in the L channel and 1 bit in the A channel. This shouldn't be too hard, just a bit of arithmetic in the pixel shader code. A more interesting problem is dealing with field-based surfaces, both when rendering and when using them in motion prediction. There's no straightforward way to render to even/odd scanlines on conventional hardware, so this will require some special attention. Currently I'm thinking I will have to render line lists instead of triangle lists when a macroblock uses field-based motion prediction and for rendering even/odd scanlines.

Here are some images from mpeg2play_accel, which I've been using as a test program:

Initial I-frame of the video.

Initial I-frame of the video.

Next frame, only P macroblocks using frame-based motion prediction are currently displayed, the rest are skipped.

Next frame, only P macroblocks using frame-based motion prediction are currently displayed, the rest are skipped.

Next frame, more macroblocks are rendered, and it looks mostly correct, except for the fine details. This is because the differentials are not taken into account yet.

Next frame, more macroblocks are rendered, and it looks mostly correct, except for the fine details. This is because the differentials are not taken into account yet.

Next frame, a few more unhandled macroblocks in this one.

Next frame, a few more unhandled macroblocks in this one.

Sunday, May 18, 2008

Intra-coded macroblocks? Check

After a few weeks of work I've made some good progress. Basic rendering of intra-coded macroblocks is working. What this means is that if you view a video you'll see the occasional full frame displayed correctly, and some macroblocks from the frames in between displayed correctly. Intra-coded macroblocks are the simplest to deal with, since they don't depend on motion compensation; all the data is present and you just have to render it. Every nth frame of an MPEG2 stream is composed entirely of intra-coded macroblocks. It's these frames that are currently being displayed correctly. Other frames are composed of some intra-coded macroblocks, but mostly inter-coded macroblocks. Inter-coded macroblocks depend on motion compensation and their samples are usually differentials. These I haven't gotten yet.

I've also cleaned things up a bit, added some error checking, and added some more tests. It's taken a lot of stepping through Gallium code to get things right, in leau of documentation, but thanks to GDB, and even more to Insight, I've gotten this far. Stephane has answered my questions, mostly on how to efficiently do things, and even the folks in #mplayerdev have been helpful on XvMC and general decoding matters, so all in all I would say things are going smoothly.

One thing I'm sure of is that no one reads this thing currently. The X.Org folks have asked that their students keep a blog and also submit it to planet.freedesktop.org but I was told it only accepts RSS feeds. Currently this entire web site is maintained using a text editor, so I'll have to work out something more sophisticated in the near future. :-/ (Update: Since then I've been using BlogSpot and you're probably reading this post there instead of the old page.)

Thursday, May 1, 2008

Up and running

Today I managed to get the basic color conversion step up and running using SoftPipe. Most of the difficulty came in understanding Gallium more than implementing the color conversion stuff. I spent many hours trying to figure out why I couldn't get any geometry to show up in my window. Copying surfaces to the frame buffer worked fine, but rendering a triangle left me staring at a black screen. It turns out that you have to set the pipe_blend_state.colormask bits for the channels you want to write to. First, I didn't even consider that state because I disabled blending. Second, setting the mask to allow writes was the opposite of what I would assume. It took several hours of stepping through Gallium to find that everything was OK until we got to the fragment shader, where it skipped the frame buffer write back.

Other issues included getting a handle on writing TGSI shader code and figuring out how to get Gallium and XvMC APIs to agree. At the moment generating TGSI isn't a pretty process, you can look in gallium/auxiliary/util/u_simple_shaders.c for an example. As for Gallium and XvMC agreeing, most of the problem came from the fact that XvMC functions all accept a Display*. What do you do if the client creates an XvMC context with one Display*, creates a surface with another Display*, and so on? Well, hopefully no one will do that, but one has to wonder why it's even allowed. Then there's the issue of some calls only taking an XvMCSurface*, and not the associated context. Unfortunately the context is where I keep the Gallium pipe context, so every surface has to have a reference to the context it was created with. Luckily this works out since some functions that do take a surface and context require that we check that they match, so at least it makes that simple.