• 6 Posts
  • 95 Comments
Joined 1 year ago
cake
Cake day: June 10th, 2023

help-circle















  • I believe the optimization came because the denominator was a power of two. In my memory, the function counted up all of the bytes being sent and checked to see that the sum was a power of 16 (I think 16 bytes made a single USB endpoint or something; I still don’t fully understand USB).

    For starters, you can split up a larger modulo into smaller ones:

    X = (A + B); X % n = (A % n + B % n) % n

    So our 16 bit number X can be split into an upper and lower byte:

    X = (X & 0xFF) + (X >> 8)

    so

    X % 16 = ((X & 0xFF) % 16 + (X >>8) % 16) % 16

    This is probably what the compiler was doing in the background anyway, but the real magic came from this neat trick:

    x % 2^n = x & (2^n - 1).

    so

    x % 16 = x & 15

    So a 16 bit modulo just became three bitwise ANDs.

    Edit: and before anybody thinks I’m good a math, I’m pretty sure I found a forum post where someone was solving exactly my problem, and I just copy/pasted it in.

    Edit2: I’m pretty sure I left it here, but I think you can further optimize by just ignoring the upper byte entirely. Again, only because 16 is a power of 2 and works nicely with bitwise arithmatic.