Thursday, 4 June 2015

Gregorian Date Calculation Golf 2a

My previous attempt at a Gregorian Date Calculation Golf routine to calculate the number of days in a month can be found here. However, this thread has an interesting discussion about optimizing the calculation in JavaScript. It turns out that using a shift-value to encode the twelve-element table is not new at all; though the following treats all instances of February as special cases and can therefore pack all the remaining months into just one bit each:

int DaysInMonth(int year, int month) {
    if (month == 2) {
        return IsLeapYear(year) ? 29 : 28;
    }
    return 30 + ((5546 >> month) & 1);
}

A potentially superior (in JavaScript) alternative is also given:

int DaysInMonth(int year, int month) {
    if (month == 2) {
        return IsLeapYear(year) ? 29 : 28;
    }
    return 30 + ((month + (month >> 3)) & 1);
}

Although both these algorithms are slower than a simple table lookup in C/C++ (on my machine), the latter is marginally quicker than my previous 32-bit and 64-bit Golf attempts.

However, even with these elegant solutions, there's always room for "improvement":

int DaysInMonth(int year, int month) {
    if (month == 2) {
        return IsLeapYear(year) ? 29 : 28;
    }
    return (month + (month >> 3)) | 30;
}

Wednesday, 3 June 2015

Gregorian Date Calculation Golf 4

Calculating a Gregorian year/month/day triplet from a Rata Die integer involves finding our temporal position within a number of cycles:

  • 146097 days for the 400-year Gregorian leap year rule cycle,
  • 1461 days for the 4-year Julian leap year rule cycle,
  • 12 months in a year, and
  • 30.6 days in the days-in-a-month approximation (see here)
This generally means a lot of divisions and modular arithmetic, which causes no end of problems for Gregorian Date Calculation Golf, particularly Rule 8.

Let's start with the canonical Richards' algorithm:


void FromRataDie(int rd, int& year, int& month, int& day) {
    int jdn = rd + 1721425;
    assert(jdn >= 0);
    int f = jdn + 1401 + (((4 * jdn + 274277) / 146097) * 3) / 4 - 38;
    int e = 4 * f + 3;
    int g = (e % 1461) / 4;
    int h = 5 * g + 2;
    day = (h % 153) / 5 + 1;
    month = (h / 153 + 2) % 12 + 1;
    year = (e / 1461) - 4716 + (14 - month) / 12;
    assert(year >= -4713);
    assert((month >= 1) && (month <= 12));
    assert((day >= 1) && (day <= 31));
}

We've already seen how we can get rid of most of these divisions using bitwise shifts or by using different divisors entirely (instead of 153 and 12, for instance):

void FromRataDie(int rd, int& year, int& month, int& day) {
    assert(rd >= 1);
    int a = rd << 2;
    int b = (a + 147321) / 146097;
    int c = ((a + b * 3) | 3) + 1220;
    int d = c / 1461;
    int e = c - d * 1461;
    int f = (e >> 2) * 2141 + 197688;
    int g = f >> 16;
    int h = (g + 3) >> 4;
    day = ((f & 0xFFFF) / 2141) + 1;
    month = g - h * 12;
    year = d + h;
    assert(year >= 1);
    assert((month >= 1) && (month <= 12));
    assert((day >= 1) && (day <= 31));
}

The rather curious bitwise-or with three in the calculation of "c" is due to the observation:

    (x | 3) == (((x / 4) * 4) + 3)


The new constant 2141 is due to the following approximation:

    153 ÷ 5 ~= 65536 ÷ 2141

Alas, our "Division by Integer Constants" trick must resort to 64-bit quantities to obtain a true Golf solution:

void FromRataDie(int rd, int& year, int& month, int& day) {
    assert((rd >= 1) && (rd <= 23936166));
    uint64_t a = uint64_t(rd << 2);
    uint64_t b = (((a + 147321) * 0xE5AC1AF3) >> 49);
    uint64_t c = ((a + b * 3) | 3) + 1220;
    int d = int((c * 0xB36D8398) >> 42);
    int e = int(c - d * 1461);
    int f = (e >> 2) * 2141 + 197688;
    int g = f >> 16;
    int h = (g + 3) >> 4;
    day = (((f & 0xFFFF) * 0x7A71) >> 26) + 1;
    month = g - h * 12;
    year = d + h;
    assert((year >= 1) && (year <= 65535));
    assert((month >= 1) && (month <= 12));
    assert((day >= 1) && (day <= 31));
}

One interesting thing about this algorithm is that none of the constants (with the possible exceptions of 12 and 1461 = 365.25 * 4) look like they've got anything to do with date calculations. Indeed, it looks more like a pseudo-random number generator.

Computing the year and the day-of-year (1 to 366) from the Rata Die integer is left as an exercise for the reader.

Gregorian Date Calculation Golf 3

So let's convert a Gregorian year/month/day triplet to a serial day number. To make things simple, I'm going to expression serial day numbers as Rata Die integers; i.e. the number of calendar days since the day before January 1, 1 C.E.

This means that January 1, 1 C.E. is RD=1 and June 1, 2015 is RD=735750.

Here's the classic Julian Day Number (JDN) calculation offset for the Rata Die scheme:

int ToRataDie(int year, int month, int day) {
    assert(year >= -4713);
    assert((month >= 1) && (month <= 12));
    assert((day >= 1) && (day <= 31));
    int a = (14 - month) / 12;
    int y = year + 4800 - a;
    int m = month + 12 * a - 3;
    int jdn = day + ((153 * m + 2) / 5) + 365 * y +
              (y / 4) - (y / 100) + (y / 400) - 32045;
    return jdn - 1721425;
}

Of course, for Gregorian Date Calculation Golf we want to eradicate those divisions. Fortunately, we're only interested in years between 1 and 65535 inclusive.

The calculation of "a" calls for a division by 12, but careful inspection of the logic uncovers:

    a == 1 for January and February
    a == 0 for all other months

So we can replace that division with another formula that achieves the same result, say:

    a = (18 - month) >> 4

Another culprit is the division by 5 in the calculation of the days until the start of the month:

    (153 * m + 2) / 5

By referring to the excellent Calendrical Calculations, we discover that the constants 153, 2 and 5 in the above equation are themselves somewhat arbitrary (see pp.53-54, but also Zeller's Congruence) and choices that better suit computer hardware can be used instead.

All this, along with a considerable amount of fettling, leads to the following implementation:

int ToRataDie(int year, int month, int day) {
    assert(year >= 1);
    assert((month >= 1) && (month <= 12));
    assert((day >= 1) && (day <= 31));
    int a = (18 - month) >> 4;
    int b = month + 12 * a + 1;
    int c = year - a;
    int d = c / 100;
    int e = (b * 1959) >> 6;
    int f = (c * 1461) >> 2;
    return day - d + (d >> 2) + e + f - 428;
}

Note that non-positive (non-Common Era) years are no longer supported; though this restriction can be relaxed relatively easily by offsetting the year and the result correspondingly.

We've already seen how to remove the division by 100 in our discussion of "IsLeapYear()", so our Golf solution is:

int ToRataDie(int year, int month, int day) {
    assert(year >= 1);
    assert((month >= 1) && (month <= 12));
    assert((day >= 1) && (day <= 31));
    int a = (18 - month) >> 4;
    int b = month + 12 * a + 1;
    int c = year - a;
    int d = (((c * 0x47AF) >> 16) + c) >> 7;
    int e = (b * 1959) >> 6;
    int f = (c * 1461) >> 2;
    return day - d + (d >> 2) + e + f - 428;
}

If you supply the day-of-the-year (1 to 366) instead of the month and day, the conversion is pleasingly simple:

int ToRataDie(int year, int dayOfYear) {
    assert(year >= 1);
    assert((dayOfYear >= 1) && (dayOfYear <= 366));
    int c = year - 1;
    int d = (((c * 0x47AF) >> 16) + c) >> 7;
    int f = (c * 1461) >> 2;
    return dayOfYear - d + (d >> 2) + f;
}

Gregorian Date Calculation Golf 2

If you browse the source code of a well-known, much-used library, you'll come across a bit of code which I'll paraphrase below:

static const int daysToMonth365[13] = {
    0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365
};
static const int daysToMonth366[13] = {
    0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366
};
int DaysInMonth(int year, int month) {
    const int* days = IsLeapYear(year) ? daysToMonth366
                                       : daysToMonth365;
    return days[month] - days[month - 1];
}

So, what's wrong with this code. Well, frankly, nothing at all ... unless you're playing Gregorian Date Calculation Golf.

The main problem is that, as any schoolchild can tell you, you only need to know if the year is a leap year when asked for the number of days in February. The code above calls "IsLeapYear(year)" for all months. Determining if a year is actually a leap year is (relatively) non-trivial, so things seem to be less than optimal. Another point is that the constant arrays "daysToMonth365" and "daysToMonth366" are used in other algorithms in their library, but the subtraction due to re-use seems to be slightly heavy-handed. Optimizing compilers understandably have a tough time improving matters.

Here's another stab at the function:

static const int daysInMonth365[12] = {
    31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
};
int DaysInMonth(int year, int month) {
    if ((month == February) && IsLeapYear(year)) {
        return 29;
    }
    return daysInMonth365[month - 1];
}

On my machine, this is actually 40% faster than the original implementation without loss of readability. Please remember: this is an academic exercise; I'm not suggesting that this code is necessarily more appropriate.

In the true spirit of Gregorian Date Calculation Golf, a pedant might point out that the table memory references may have an adverse effect on CPU pipelining and caching (but only if they are blind to the enormous "if" statement preceding it!). If that's truly the case, the following implementation packs the table into a 64-bit constant that gets embedded in the machine code:

int DaysInMonth64(int year, int month) {
    if ((month == February) && IsLeapYear(year)) {
        return 29;
    }
    const uint64_t magic = 0x0FFBFEFFFDFF7F9F;
    return int(magic >> ((month - 1) * 5)) & 31;
}

Again, on my machine compiled for x64, this is a smidgen faster than using the array-lookup. Not surprisingly, it performs terribly on 32-bit instruction sets because of the 64-bit shift involved.

A non-table version can be written for 32-bit instruction sets by storing each table entry as two bits:

int DaysInMonth32(int year, int month) {
    if ((month == February) && IsLeapYear(year)) {
        return 29;
    }
    const int magic = 0x03BBEECC;
    return ((magic >> (month << 1)) | -4) + 32;
}

But, although beautiful in its obfuscation, it proves rarely to be faster than using a table lookup.

A quick google for those magic numbers scores no hits, which suggests that these algorithms may be quite novel. I wonder why...

Saturday, 30 May 2015

Gregorian Date Calculation Golf 1

It's been a while since my last post to this blog, so I thought I'd return to a guilty pleasure of mine: Needless optimizations of date calculations. See Computus 3.

Date calculation optimizations are often overlooked because of the scope for subtle errors. If you are writing production code that is going to be used in anger, I strongly suggest you check and re-check everything you see hereafter. At the very least, run it through a thorough test suite.

So, what are the rules of "Gregorian Date Calculation Golf"? Well...

  1. Pseudo-code must be presented in C/C++/C# flavour.
  2. The integer type "int" is expected to be at least 32 bits long.
  3. The integer type "long" is expected to be at least 64 bits long.
  4. A (proleptic) Gregorian year is assumed to be between 1 (meaning 1 CE) and 65535 inclusive.
  5. A Gregorian month is assumed to be between 1 (January) and 12 (December) inclusive.
  6. Leap days must be handled correctly.
  7. Time and timezones are ignored.
  8. Integer, arithmetic divisions (and their cousins) are not permitted.
Rule 8 is where the fun starts!

To warm up, let's assume we want to decide whether a Gregorian year is a leap year.

According to Wikipedia, the logic is:
if (year is not exactly divisible by 4) then (it is a common year)
else
if (year is not exactly divisible by 100) then (it is a leap year)
else
if (year is not exactly divisible by 400) then (it is a common year)
else (it is a leap year)
 So here's an obvious implementation:

  bool IsLeapYear(int year) {
    if ((year % 4) != 0) {
      return false;
    }
    if ((year % 100) != 0) {
      return true;
    }
    if ((year % 400) != 0) {
      return false;
    }
    return true;
  }

However, the "%" modulo operator involves an implicit division, violating our Rule 8. We can remove two of the divisions by using the bitwise-and operator "&":

  bool IsLeapYear(int year) {
    if ((year & 3) != 0) {
      return false;
    }
    int century = year / 100;
    if ((century * 100) != year) {
      return true;
    }
    if ((century & 3) != 0) {
      return false;
    }
    return true;
  }

However, there's still a division by a constant (100). Fortunately, we can use the fact that:
x / c = x * (1 / c)
This technique is known as "Integer Division by Constants". Indeed, any optimizing compiler worth its salt will convert the division in the code above to a multiplication. But, in the spirit of Gregorian Date Calculation Golf, we'll do it explicitly.

  bool IsLeapYear(int year) {
    if ((year & 3) != 0) {
      return false;
    }
    int century = (((year * 0x47AF) >> 16) + year) >> 7;
    if ((century * 100) != year) {
      return true;
    }
    if ((century & 3) != 0) {
      return false;
    }
    return true;
  }

This can be needlessly re-factored to:

  bool IsLeapYear(int year) {
    if ((year & 3) == 0)
    {
      int century = (((year * 0x47AF) >> 16) + year) >> 7;
      return ((century * 100) != year) || ((century & 3) == 0);
    }
    return false;
  }

So, can we convert Gregorian year/month/day triplets to and from serials (i.e. the number of days since an epochal date)? Of course we can! Next time.

Saturday, 6 September 2014

RGB/HSV in HLSL 6

It is often common to need to take a colour image and change some aspect of it: its hue, saturation, brightness or contrast.

You could simply convert the image to an appropriate colour space (e.g. HSV) perform the manipulations and then convert back to RGB. However, of the three components of the HSV colour space, it is least likely that you'll want to change the hue, which is trickiest component to convert. So can you change the saturation and value without performing the full round-trip conversions?

Remember that for RGB to HSV:

  float U = min(R, min(G, B));
  float V = max(R, max(G, B));
  float C = V - U;

  float S = C / (V + Epsilon);

The hue 'H' is given by the difference between two RGB channel values divided by 'V'.

So if we maintain the absolute differences between RGB channel values, we maintain the chroma 'C' (because the 'V' and 'U' values change by the same amount) but not necessarily the HSV saturation 'S' (because the denominator 'V' changes).

This implies we can easily change the chroma (or HSV saturation) by manipulating 'U' without any impact on the value 'V'. However, changing the value 'V' (or lightness) is not so easy without also changing the chroma (or HSV saturation).

Consider another formulation of HSV saturation (without the epsilon term):

  float S = C / V;
          = (V - U) / V;
          = 1 - (U / V);

It is dependent on the ratio of the minimum and maximum RGB channel values. So if we maintain this ratio, we do not affect the HSV saturation.

Anyway, let's start with what appears to be the easier modification first. Suppose we have an RGB triplet and we want to change its HSV saturation to 'S_wanted':

  float U = min(R, min(G, B));
  float V = max(R, max(G, B));

  float Q = S_wanted * V / (V - U + Epsilon);
  RGB = (V - RGB) * Q;

Above, 'Q' is equivalent to 'S_wanted / S' and 'Epsilon' is an appropriately tiny number, e.g. 1e-10.

Now the trickier modification. Suppose we have an RGB triplet and we want to change its HSV value to 'V_wanted' without changing its HSV saturation (or hue). We can multiply all channel values by
'V_wanted / V' and then apply the saturation modification algorithm above to restore the original saturation. But wait! The channel multiplication does not change the ratio of the minimum and maximum RGB channel values, so it therefore does not change the saturation after all. Only a simple multiplicative scaling is required:

  float V = max(R, max(G, B));
  RGB *= V_wanted / (V + Epsilon);

Please remember that we're talking about simplifying modifications in the HSV colour space here. HSL is another matter entirely. 

Monday, 25 August 2014