Set Operations On HLLs of Different Sizes

Introduction

Here at AK, we’re in the business of storing huge amounts of information in the form of 64 bit keys. As shown in otherblogposts and in the HLL post by Matt, one efficient way of getting an estimate of the size of the set of these keys is by using the HyperLogLog (HLL) algorithm.  There are two important decisions one has to make when implementing this algorithm.  The first is how many bins one will use and the second is the maximum value one allows in each bin.  As a result, the amount of space this will take up is going to be the number of bins times the log of the maximum value you allow in each bin.  For this post we’ll ignore this second consideration and focus instead on the number of bins one uses.  The accuracy for an estimate is given approximately by 1.04/√b

View original post 889 more words

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s