This is the documentation for libdivide, a library for optimizing integer division. libdivide has both a C and a C++ interface. Pick one:

libdivide in C++

The C++ API leverages templates and operator overloading. If you don't like these things, feel free to use the C API, which works fine in C++.

An example is worth a thousand words, so jump in. Here is a function that divides many integers by a fixed integer, and returns their sum:

int sum_of_quotients(const int *numers, int count, int d) {
    int result = 0;
    for (int i=0; i < count; i++)
        result += numers[i] / d; //this division is slow!
    return result;
}
Here is how you would optimize it with libdivide in C++:
int sum_of_quotients(const int *numers, int count, int d) {
    int result = 0;
    libdivide::divider<int> fast_d(d); //constructs an instance of libdivide::divider
    for (int i=0; i < count; i++)
        result += numers[i] / fast_d; //uses faster libdivide division
    return result;
}
Despite the division operator, no division instructions are issued in the second code. The division operator is overloaded to instead use a multiply and shift, which were precomputed in the constructor for libdivide::divider.

All of libdivide is contained in a single header file, with the libdivide namespace. The sole public class in this namespace is 'divider'. This class is a template, parameterized by the type you want to divide. Four types are supported: int32_t, int64_t, uint32_t, and uint64_t, with other types producing an error.

When dividing, the numerator may be the same type as the denominator. If vector support is enabled, then it may also be a vector type. Supported vector types are the corresponding NEON types (uint32x4_t, int32x4_t, uint64x2_t, int64x2_t) and the x86 family (__m128i, __m256i, __m512i).


libdivide in C

The C API takes the form of a family of regularly named functions. You use libdivide in C by passing the divisor to a generating function, which returns a struct. You then pass a dividend (numerator) and a pointer to the struct to a do function, which returns the resulting quotient.

Here is that normal C function that divides many integers by a fixed integer, and returns their sum:

int sum_of_quotients(const int *numers, int count, int d) {
    int result = 0;
    for (int i=0; i < count; i++)
        result += numers[i] / d; //this division is slow!
    return result;
}
Here is how you would optimize it with libdivide in C:
int sum_of_quotients(const int *numers, size_t count, int d) {
    int result = 0;
    struct libdivide_s32_t fast_d = libdivide_s32_gen(d);
    for (size_t i=0; i < count; i++)
        result += libdivide_s32_do(numers[i], &fast_d); // performs faster libdivide division
    return result;
}

The four supported types are int32_t, uint32_t, int64_t, and uint64_t. The four generating functions are:

Similarly, there are four do functions. Each accepts a numerator and returns the result of dividing it by the denominator passed to the gen function:

There are also do_vec functions, designed to integrate with vector intrisics. The functions are named according to the vector width: 128 for SSE2 and NEON, 256 for AVX2, 512 for AVX512. Each accepts a vector containing either two or four packed numerators, and returns a vector containing the result of dividing each by the denominator passed to the gen function:


Preprocessor defines

libdivide's behavior can be tweaked by a few preprocessor macros: