Cosine redux

It's been quite a while since my last blog entry: I had a bit of a technology meltdown that I'm not quite done with yet :-(

I was doing some recreational hacking over the holidays that involved evaluating cosines. I ended up doing (once again!) an implementation of cosine (don't ask why). Given the confused flamage that my previous posts on cosine generated, I figure that showing some code would be useful. I would never have thought that cosine would generate so much email traffic. Yes, I know about Taylor series. No I wasn't trying to insult centuries of standard mathematical practice. Performance is often the art of cheating carefully.

So, here's my implementation of cosine, with its argument in turns (1 turn == 360 degrees):

public static float cost(float f) {
    int bits = Float.floatToRawIntBits(f);
    int mantissa = (bits&0x7FFFFF)|(1<<23);
    int shift = (bits<<1>>>24)-127+9; // exponent, unbiased, with shift
    if(shift>=32 || shift<=-32) return 1;
    int fractionBits = shift>=0 ? mantissa<<shift : mantissa>>-shift;
    int tableIndex = (fractionBits>>>(30-resultPrecision))&tableSizeMask;
    switch(fractionBits>>>30) { // Quadrant is top two bits
        case 0: return cosTable[tableIndex];
        case 1: return -cosTable[tableSizeMask-tableIndex];
        case 2: return -cosTable[tableIndex];
        default/*case 3*/: return cosTable[tableSizeMask-tableIndex];
Lets go through this slowly:
  1. int bits = Float.floatToRawIntBits(f);
    Get the IEEE 754 bits
  2. int mantissa = (bits&0x7FFFFF)|(1<<23);
    The mantissa is the bottom 23 bits - to which the hidden bit must be prepended.
  3. int shift = (bits<<1>>>24)-127+9;
    Extract the exponent, correct for the exponent bias, then add a bias to move the binary point to the top of the word.
  4. if(shift>=32 || shift<=-32) return 1;
    If the shift is too large, the fraction bits would be zero, therefore the result is 1.
  5. int fractionBits = shift>=0 ? mantissa<<shirt : mantissa>>-shift;
    Shift the mantissa so that it's a fixed point number with the binary point at the top of the 32 bit int. The magic is in what's not here: because the argument is in turns, I get to ignore all of the integer bits (range reduction made trivial); and because it's the cosine function, which is symmetric about the origin, I get to ignore the sign.
  6. int tableIndex = (fractionBits>>>(30-resultPrecision))&tableSizeMask;
    The top two bits are the quadrant... extract the bits below that to derive a table index.
  7. switch(fractionBits>>>30) {
    One case for each quadrant. This switch could be eliminated by making the table 4 times larger.
  8. case 0: return cosTable[tableIndex];
    Yes! It's just a table lookup! Truly trivial.
  9. case 1: return -cosTable[tableSizeMask-tableIndex];
Since this is just a table lookup, the resulting approximation can be pretty jagged if the table is small. But it's easy to tune the table size depending on accuracy needs. The smaller the table is, the higher the cache hit rate will be, and the more likely it is that the whole table will fit in cache.

Table lookups are a very common way to implement mathematical functions, particularly the periodic ones like cosine. There are all kinds of elaborations. One of the most common for improving accuracy is to do some sort of interpolation between table elements (linear or cubic, usually).

January 20, 2006