geek++;

Aug 03, 2004 23:51

Tonight, I started on Calyx again. I started with the display/hardware layer. One of my design requirements was alpha-blending support in the system.

Long ago, when I started on dGUI/Calyx, I found what was purported to be a very fast software alpha blender. Which is great, for drivers that won't implement it. So, today, I went looking for the document from which I got the concept and base code. This time around, I realised that it could be made better than it was.

The following is contained in a comment from the Calyx source tree. Prettified for HTML, of course. Yeah, there's probably an error or two in here somewhere. I don't care.

http://www.gamedev.net/reference/articles/article1594.asp

That's where the idea behind this blending method comes from. I intend to take his idea and make it even faster.

His table initaliser (reformatted for readability):

int InitTable() {
     float fValue, fAlpha;
     int iValue, iAlpha;
     for (iAlpha = 0; iAlpha < 256; iAlpha++) {
          fAlpha = ((float)iAlpha) / 255;
          for (iValue = 0; iValue < 256; iValue++) {
               fValue = ((float)iValue) / 255;
               AlphaTable.Levels[iAlpha].Values[iValue] = clipByte((int)((fValue * fAlpha) * 255));
          }
     }
     return true;
}

For each of the 256 alpha levels, 256 values are generated. For some unknown reason, he decided to do this math piecewise. Although, I guess it might make sense in some way for explanation.

fAlpha = ((float)iAlpha) / 255;
     fValue = ((float)iValue) / 255;
     the actual value = (fAlpha * fValue) * 255;

Let's do some simple replacement (float casting removed):

the actual value = ((iAlpha / 255) * (iValue / 255)) * 255;

Simplify the parentheses:

the actual value = ((iAlpha * iValue) / 65025) * 255;

Multiply the 255 through:

the actual value = (iAlpha * iValue) / 255;

BOOYA! One multiply and one divide, as opposed to two of each. I win.

The clipByte function:

__inline unsigned __int8 clipByte(int value) {
     value = (0 & (-(int)(value < 0))) | (value & (-(int)!(value < 0)));
     value = (255 & (-(int)(value > 255))) | (value & (-(int)!(value > 255)));
     return value;
}

The idea here is to take a 32-bit number and strip it down to 8 bits. Because our math here is unsigned, it is not important for us to keep the sign bit. We just want the final 8 bits, nothing else matters.

We can do this in a macro:

#define clipByte(x)     ((x) & 255)

I imagine that becomes less than 15 ix86 instructions (it's just a simple 8-bit AND)...

And, now, my version of the additive alpha blend code:
#define cb(x) ((x) & 255)

uint8 _cx_alpha_values[256][256];

void cx_init_alpha() {
     int iv, ia;
     for (ia = 0; ia < 256; ia++)
          for (iv = 0; iv < 256; iv++)
               _cx_alpha_values[ia][iv] = (ia * iv) / 255;
}

/* Six-line alpha blend. Oh yeah. */
uint32 cx_alpha_blend(uint32 top, uint32 bottom)
{
     uint32 blend = 0;
     uint8 * st = _cx_alpha_values[(top >> 24)];
     blend |= cb(st[cb(top >> 16)] + cb(bottom >> 16)) << 16;
     blend |= cb(st[cb(top >> 8)] + cb(bottom >> 8)) << 8;
     blend |= cb(st[cb(top)] + cb(bottom));
     return (blend | 0xff00000);
}

Sadly, I can't correctly test it until I get enough of the Calyx framework together...
Up