Group VarIntIn Chapter 6, we discussed the byte-aligned vByte method as an example of an index compression technique with high decoding performance. After we had written the chapter, Jeff Dean gave a keynote talk at WSDM 2009 in which he discussed a few compression algorithms used by Google, including the byte-aligned Group VarInt method.
Group VarInt is named after VarInt (variable-size integer coding), Google's version of the vByte algorithm. vByte's decoding routine is very efficient, but often suffers from branch misprediction. Consider the vByte representation of the postings list L = 〈80, 400, 431, 686〉:
(where the continuation bits are underlined). When working through the encoded postings sequence, the decoder needs to inspect every single continuation bit and, depending on whether it is 1 or 0, either continue with the current posting or skip to the next. This means there is a branch instruction, and thus the possibility of a branch misprediction, for every byte in the encoded postings sequence.
Group VarInt addresses this problem in two steps:
(the selector bits corresponding to each encoded posting are shown in the same color as the respective posting). For example, the second Δ-value, 320, can be encoded using 2 bytes, so its selector value is 01.
Publicly available implementations of Group VarInt typically use a precomputed lookup table to determine the position of each posting in the encoded data chunk and the bit mask to apply when extracting it. In our own experiments, we found that a little bit shifting works just as well.
The following decoder implementation assumes a little-endian architecture that supports unaligned memory access (e.g., Intel/AMD); on other architectures, the decoding operations may be slightly more complicated.
If the number of postings in the compressed list is not a multiple of 4, then special-case handling may be necessary for the final 1–3 postings in the list — for instance, by encoding those postings using vByte.
The following table is identical with Table 6.9 (page 213) in the book, except for the addition of Group VarInt:
Perhaps surprisingly, the performance difference between vByte and Group VarInt in the above table is very small. The reason for this is that the vast majority of Δ-values in the docid index for GOV2 require 7 bits or less, in which case vByte incurs no branch mispredictions. To evaluate the performance difference between vByte and Group VarInt on an index in which the Δ-values vary over a larger range, we repeated the experiment from Table 6.9 for a schema-independent index instead of a docid index:
Compared to the docid index, the decoding overhead for vByte increases by 3.0 ns per list element (+221%). The decoding overhead for Group VarInt, on the other hand, grows by only 0.8 ns per list element (+76%). Thus, while vByte may be an appropriate compression method for docid indices, since it is almost as fast as Group VarInt and slightly more compact, it is not ideal for schema-independent postings lists, where Group VarInt delivers much better decoding performance. If postings lists are stored on disk, one may still argue that vByte is the superior compression method, as it achieves better compression rates. However, if the index is kept in memory, then the factor-2.3 difference in decoding performance makes Group VarInt the clear winner.