Understanding Entropy

Entropy is a measure of certainty about information in the information technology field. We can utilize this formula to determine how much a given piece of data increases the possibility of identifying a given device. You can think of entropy as a certain value indicating how many values a random variable can have: two possibilities - 1 bit of entropy, four possibilities - 2 bits of entropy, etc. Since there are about 2.2 billion computers on Earth, you need about 31 bits of entropy (2 ^ 31 ≈ 2 billion) to exactly identify a random computer. Often you can get a way with less as you need enough values to identify all the computers that will visit your site, which, unless you are Facebook, falls far short of 3 billion.

Each additional data point you collect reduces the entropy by a certain value, which you can calculate using this formula: ΔS = - log2 Pr (X = x), where ΔS is the entropy decrease expressed in bits and Pr (X = x) is the probability of a given fact. For example, date of birth: ΔS = - log2 Pr (DOB = 01.11 ) = -log2 (1/365) ≈ 8.51 bits of information. (See How Unique is Your Browser by P. Eckersley)

Ideally, the data points you select for your device print are slow to change. For example, User-Agent is unique and provides a lot of entropy, but each version update of the browser changes it. The host operating system changes less frequently and therefore may allow tracking over a longer time period.