Class LongSwissHash

java.lang.Object
org.elasticsearch.swisshash.SwissHash
org.elasticsearch.swisshash.LongSwissHash
All Implemented Interfaces:
Closeable, AutoCloseable, LongHashTable, org.elasticsearch.core.Releasable

public final class LongSwissHash extends SwissHash implements LongHashTable
Assigns int ids to longs, vending the ids in order they are added.

At it's core there are two hash table implementations, a "small core" and a "big core". The "small core" is a simple open addressed hash table with a fixed 60% load factor and a table of 2048. It's quite quick because it has a fixed size and never grows.

When the "small core" has more entries than it's load factor the "small core" is replaced with a "big core". The "big core" functions quite similarly to a Swisstable, Google's fancy SIMD hash table. In this table there's a contiguous array of "control" bytes that are either 0b1111_1111 for empty entries or 0b0aaa_aaaa for populated entries, where aaa_aaaa are the top 7 bytes of the hash. To find an entry by key you hash it, grab the top 7 bytes or it, and perform a SIMD read of the control array starting at the expected slot. We use a fixed group size of 16, so use 128 bit SIMD instructions. If any of those match we check the actual key. So instead of scanning one slot at a time "small core", we effectively scan a whole bunch at once. This allows us to run a much higher load factor (87.5%) without any performance penalty so the extra byte feels super worth it.

When a "big core" fills it's table to the fill factor, we build a new "big core" and read all values in the old "big core" into the new one.

This class does not store the keys in the hash table slots. Instead, it uses a keyPages to store the actual values, and the hash table slots store the id which indexes into the keyPages.

  • Method Details

    • find

      public long find(long key)
      Finds an id by a key.
      Specified by:
      find in interface LongHashTable
    • add

      public void add(long[] keys, long[] ids, int length)
      Adds many keys at once, putting their ids into an array. If any key was already present it's previous assigned id will be added to the array. If it wasn't present it'll be assigned a new id.

      This method tends to be faster than add(long).

    • add

      public long add(long key)
      Add a key, returning its ids. If it was already present it's previous assigned id will be returned. If it wasn't present it'll be assigned a new id.
      Specified by:
      add in interface LongHashTable
    • status

      public SwissHash.Status status()
      Description copied from class: SwissHash
      Performance information hopefully useful for debugging.
      Specified by:
      status in class SwissHash
    • iterator

      public LongSwissHash.Itr iterator()
      Description copied from class: SwissHash
      Build an iterator to walk all values and ids.
      Specified by:
      iterator in class SwissHash
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in interface org.elasticsearch.core.Releasable
    • get

      public long get(long id)
      Returns the key at 0 <= id <= size(). The result is undefined if the id is unused.
      Specified by:
      get in interface LongHashTable
      Parameters:
      id - the id returned when the key was added
      Returns:
      the key