Counting people/unique visitors/users at web sites is a challenge, and is something that I've been working on for the past couple of months for the web site of a large media company. The goal is to count the number of distinct users over the course of a month. Counting distinct cookies is easy; the challenge is turning these into human beings. These challenges include:

- Cookie deletions. A user may manually delete their cookies one or more times during the month.

- Disallowing first party cookies. A user may allow session cookies (while the browser is running), but not allow the cookies to be committed to disk.

- Multiple browsers. A single user may use multiple browsers on the same machine during the month. This is particularly true when the user upgrades his or her browser.

- Multiple machines. A single user may use multiple machines during the month.

And, I have to admit, that the data that I'm using has one more problem, which is probably not widespread. The cookies are actually hashed into four bytes. This means that it is theoretically possible for two "real" cookies to have the same hash value. Not only theoretically possible, but it happens (although not too frequently).

I came across a very good

blog by Angie Brown that lays out the assumptions in making the calculation, including a spreadsheet for varying the assumptions. One particularly interesting factoid from the blog is that the number cookies that appear only once during the month exceeds the number of unique visitors, even under quite reasonable assumptions. Where I am working, one camp believes that the number of unique visitors is approximated by the number of unique cookies.

A white paper by ComCast states that the average user has 2.5 unique cookies per month due to cookie deletion. The paper is

here, and a PR note about it is it is

here. This paper is widely cited, although it has some serious methodological problems due to the fact that its data sources are limited to DoubleClick and Yahoo!.

In particular, Yahoo! is quite clear about its cookie expiration policies (

two weeks for users clicking the "keep me logged in for 2 weeks" box and

eight hours for Yahoo! mail). I do not believe that this policy has changed significantly in the last few years, although I am not 100% sure.

The white paper from ComCast does not mention these facts, which means that most of the cookies that a user has are due to automatic deletion, not user behavior. How many distinct cookies does a user have, due only to the user's behavior?

If I make the following assumptions:

- The Yahoo! users have an average of 2.5 cookies per month.

- ComCast used the main Yahoo! cookies, and not the Yahoo! mail cookies.

- All Yahoo! users use the site consistently throughout the month.

- All Yahoo! users have the "keep me logged in for 2 weeks" box checked.

Then I can estimate the number of cookies per user per machine per month. The average user would have 31/14 = 2.2 cookies per month, strictly due to the automatic deletion. This leaves 0.3 cookies per month due to manual deletion. Of course, the user starts with one cookie. So the average number of cookies per month per user per machine is 1.3.

By the way, I find this number much more reasonable. I also think that it misses the larger source of overcounting -- users who use more than one machine. Unfortunately, there is no single approach. In the case that I'm working on, we have the advantage that a minority of users are registered, so we can use them as a sample.

Labels: gordon, Web analytics