Monday, May 26, 2008

Something to show

I've put some more work into getting P and B frames rendering correctly and things are proceeding very well. Currently texturing from the reference frame works for P frames. All I have to do is add the differentials, which is a little tricky. The problem is that differentials are 9 bits, which means that in an A8L8 texture we get 8 bits in the L channel and 1 bit in the A channel. This shouldn't be too hard, just a bit of arithmetic in the pixel shader code. A more interesting problem is dealing with field-based surfaces, both when rendering and when using them in motion prediction. There's no straightforward way to render to even/odd scanlines on conventional hardware, so this will require some special attention. Currently I'm thinking I will have to render line lists instead of triangle lists when a macroblock uses field-based motion prediction and for rendering even/odd scanlines.

Here are some images from mpeg2play_accel, which I've been using as a test program:

Initial I-frame of the video.

Initial I-frame of the video.

Next frame, only P macroblocks using frame-based motion prediction are currently displayed, the rest are skipped.

Next frame, only P macroblocks using frame-based motion prediction are currently displayed, the rest are skipped.

Next frame, more macroblocks are rendered, and it looks mostly correct, except for the fine details. This is because the differentials are not taken into account yet.

Next frame, more macroblocks are rendered, and it looks mostly correct, except for the fine details. This is because the differentials are not taken into account yet.

Next frame, a few more unhandled macroblocks in this one.

Next frame, a few more unhandled macroblocks in this one.

Sunday, May 18, 2008

Intra-coded macroblocks? Check

After a few weeks of work I've made some good progress. Basic rendering of intra-coded macroblocks is working. What this means is that if you view a video you'll see the occasional full frame displayed correctly, and some macroblocks from the frames in between displayed correctly. Intra-coded macroblocks are the simplest to deal with, since they don't depend on motion compensation; all the data is present and you just have to render it. Every nth frame of an MPEG2 stream is composed entirely of intra-coded macroblocks. It's these frames that are currently being displayed correctly. Other frames are composed of some intra-coded macroblocks, but mostly inter-coded macroblocks. Inter-coded macroblocks depend on motion compensation and their samples are usually differentials. These I haven't gotten yet.

I've also cleaned things up a bit, added some error checking, and added some more tests. It's taken a lot of stepping through Gallium code to get things right, in leau of documentation, but thanks to GDB, and even more to Insight, I've gotten this far. Stephane has answered my questions, mostly on how to efficiently do things, and even the folks in #mplayerdev have been helpful on XvMC and general decoding matters, so all in all I would say things are going smoothly.

One thing I'm sure of is that no one reads this thing currently. The X.Org folks have asked that their students keep a blog and also submit it to planet.freedesktop.org but I was told it only accepts RSS feeds. Currently this entire web site is maintained using a text editor, so I'll have to work out something more sophisticated in the near future. :-/ (Update: Since then I've been using BlogSpot and you're probably reading this post there instead of the old page.)

Thursday, May 1, 2008

Up and running

Today I managed to get the basic color conversion step up and running using SoftPipe. Most of the difficulty came in understanding Gallium more than implementing the color conversion stuff. I spent many hours trying to figure out why I couldn't get any geometry to show up in my window. Copying surfaces to the frame buffer worked fine, but rendering a triangle left me staring at a black screen. It turns out that you have to set the pipe_blend_state.colormask bits for the channels you want to write to. First, I didn't even consider that state because I disabled blending. Second, setting the mask to allow writes was the opposite of what I would assume. It took several hours of stepping through Gallium to find that everything was OK until we got to the fragment shader, where it skipped the frame buffer write back.

Other issues included getting a handle on writing TGSI shader code and figuring out how to get Gallium and XvMC APIs to agree. At the moment generating TGSI isn't a pretty process, you can look in gallium/auxiliary/util/u_simple_shaders.c for an example. As for Gallium and XvMC agreeing, most of the problem came from the fact that XvMC functions all accept a Display*. What do you do if the client creates an XvMC context with one Display*, creates a surface with another Display*, and so on? Well, hopefully no one will do that, but one has to wonder why it's even allowed. Then there's the issue of some calls only taking an XvMCSurface*, and not the associated context. Unfortunately the context is where I keep the Gallium pipe context, so every surface has to have a reference to the context it was created with. Luckily this works out since some functions that do take a surface and context require that we check that they match, so at least it makes that simple.

Friday, April 25, 2008

Accepted, digging through code

After a long interim period I now know that the proposal has been accepted. Rather than sit around I've been working on getting things up and running, so I'm glad I got a head start on things. It took a lot of digging through Mesa code and some questions to the dri-devel mailing list and IRC channel, but I've managed to get some basic initialization out of the way. I've also implemented enough functionality and stubs to get some basic test cases compiling and running successfully. I found a port of mpeg2play on Mark Vojkovich's web site that uses XvMC and have managed to get that compiling and running. By running I mean not crashing, it doesn't display anything as of yet, but at least I'm heading in the right direction.

Now I'll need to figure out how to get XvMC surfaces onto X drawables with Gallium3D. For some reason most of the XvMC functions don't take the XvMCContext as an argument, so I have to store that along with each surface, and yet they all take a pointer to Display, which I don't see a use for. A headache more than anything else, but it seems counter-intuitive to me. Also, the Gallium3D API is new to me and it will take some time to figure out. Keith Whitwell provided me with an in depth explanation of how to start on the state tracker and winsys thankfully. I'm hoping by the end of this weekend I'll have something on screen, even if it's garbage (i.e. the video frames before IDCT). I'm also hoping to get started on writing shader code to do the color conversion. Stephane Marchesin, my mentor for this project was kind enough to point me to the current Xv implementation for the Nouveau driver, which does color conversion and bicubic interpolation in shaders currently.

Monday, April 7, 2008

Submittion day redux

So after last week's deadline extension today became the deadline for the proposal submission. It hasn't really affected me, but it would have been nice to know by now if this project had been accepted by now. Instead the accepted proposals will be announced April 21st.

In the interim I've been looking through the libXvMC and Mesa sources. The source to libXvMC is a little confusing, partly because of the wrapper library that comes with it, but after a little reading, grep-ing, and peeking at the openChrome XvMC driver I think I've got a handle on how things work. As far as I can see, the libXvMC module provides implementations for all the hardware-agnostic XvMC API calls, and leaves the rest to the driver. It also exports some functions that the driver can use to interface with X. The wrapper module, libXvMCW is intended for clients to link against and exports all the XvMC functions the client expects. The wrapper doesn't contain any implementation, but instead attemps to dynamically load the libXvMC module for the hardware-agnostic functions, and a hardware-specific driver (e.g. libXvMCNvidia) for the rest of the functions. The driver is left to implement the surface/block/rendering related functions. So with that I think it's pretty clear which functions I would have to provide in terms of Gallium3D to complete the implementation.

In addition to that I've gotten back into using Matlab for some prototyping. Matlab is a great tool for this sort of thing because it allows you to easily visualize your data, and I've been using it to test some CSC and IDCT routines.

Monday, March 31, 2008

Submittion day

Today is the deadline for the proposal submission. I hope it is accepted, I think it's a perfect fit for GSoC. I had some trouble trimming it down to 7500 characters because I initially misread the guidelines and thought it said 7500 words, if it wasn't for someone on the dri-devel mailing list reminding me I'd have submitted all 8K characters. I'm feeling lucky already!

Saturday, March 29, 2008

Potential benefits of accelerated video decoding?

I've started considering some of the potential benefits of hardware accelerated video decoding from a user's perspective. The biggest one for me is being able to play back HD streams in real-time. I have a modest machine and it does struggle with HD streams, but having read this paper I'm encouraged by one statement in particular. Testing on a machine equipped with a Pentium III @ 667 MHz, 256 MB of memory, and a GeForce 3 Ti200, they state that they were able to play back a 720p ~24-frame/s stream encoded with WMV at 5 Mb/s. That hardware is pretty ancient by today's standards, and yet with the GPU handling MC and CSC they get a 3.16x speed-up over the CPU w/ MMX implementation. That's pretty encouraging. I'm sure all the folks out there who use their machines as HTPCs would really appreciate that sort of performance.

Another benefit would be that implementing this in terms of a Gallium3D front-end allows it to be used on all hardware that has a Gallium3D back-end. Currently I believe certain Intel GPUs are supported very well, as well as Nvidia through Nouveau, but I'm thinking specifically of AMD/ATI. As far as I know their Linux drivers have never supported any sort of video decoding acceleration, even though the hardware is very capable and the functionality is implemented on Windows. Recently they released hardware specs for some of their GPUs, but this did not include any of the dedicated video decoding hardware if I recall correctly. However, with specs for the newer GPUs and various reverse-engineered drivers for older GPUs already existing, comprehensive Gallium3D support for ATI GPUs will probably happen. I think the point is obvious by now: hardware accelerated video decoding on ATI GPUs finally.