Monday, December 28, 2009

Differential Response or Uplift Modeling

Some time before the holidays, we received the following inquiry from a reader:

Dear Data Miners,



I’ve read interesting arguments for uplift modeling (also called incremental response modeling) [1], but I’m not sure how to implement it. I have responses from a direct mailing with a treatment group and a control group. Now what? Without data mining, I can calculate the uplift between the two groups but not for individual responses. With the data mining techniques I know, I can identify the ‘do not disturbs,’ but there’s more than avoiding mailing that group. How is uplift modeling implemented in general, and how could it be done in R or Weka?



[1] http://www.stochasticsolutions.com/pdf/CrossSell.pdf

I first heard the term "uplift modeling" from Nick Radcliffe, then of Quadstone. I think he may have invented it. In our book, Data Mining Techniques, we use the term "differential response analysis." It turns out that "differential response" has a very specific meaning in the child welfare world, so perhaps we'll switch to "incremental response" or "uplift" in the next edition. But whatever it is called, you can approach this problem in a cell-based fashion without any special tools. Cell-based approaches divide customers into cells or segments in such a way that all members of a cell are similar to one another along some set of dimensions considered to be important for the particular application. You can then measure whatever you wish to optimize (order size, response rate, . . .) by cell and, going forward, treat the cells where treatment has the greatest effect.

Here, the quantity  to measure is the difference in response rate or average order size between treated and untreated groups of otherwise similar customers. Within each cell, we need a randomly selected treatment group and a randomly selected control group; the incremental response or uplift is the difference in average order size (or whatever) between the two. Of course some cells will have higher or lower overall average order size, but that is not the focus of incremental response modeling. The question is not "What is the average order size of women between 40 and 50 who have made more than 2 previous purchases and live in a neighborhood where average household income is two standard deviations above the regional average?" It is "What is the change in order size for this group?"

Ideally, of course, you should design the segmentation and assignment of customers to treatment and control groups before the test, but the reader who submitted the question has already done the direct mailing and tallied the responses. Is it now too late to analyze incremental response?  That depends: If the control group is a true random control group and if it is large enough that it can be partitioned into segments that are still large enough to provide statistically significant differences in order size, it is not too late. You could, for instance, compare the incremental response of male and female responders.

A cell-based approach is only useful if the segment definitions are such that incremental response really does vary across cells. Dividing customers into male and female segments won't help if men and women are equally responsive to the treatment. This is the advantage of the special-purpose uplift modeling software developed by Quadstone (now Portrait Software). This tool builds a decision tree where the splitting criteria is maximizing the difference in incremental response. This automatically leads to segments (the leaves of the tree) characterized by either high or low uplift.  That is a really cool idea, but the lack of such a tool is not a reason to avoid incremental response analysis.

Labels: , , ,

Thursday, May 1, 2008

Statistical Test for Measuring ROI on Direct Mail Test

If I want to test the effect of return of investment on a mail/ no mail sample, however, I cannot use a parametric test since the distribution of dollar amounts do not follow a normal distribution. What non-parametric test could I use that would give me something similar to a hypothesis test of two samples?

Recently, we received an email with the question above. Since it was addressed to bloggers@data-miners.com, it seems quite reasonable to answer it here.

First, I need to note that Michael and I are not statisticians. We don't even play one on TV (hmm, that's an interesting idea). However, we have gleaned some knowledge of statistics over the years, much from friends and colleagues who are respected statisticians.

Second, the question I am going to answer is the following: Assume that we do a test, with a test group and a control group. What we want to measure is whether the average dollars per customer is significantly different for the test group as compared to the control group. The challenge is that the dollar amounts themselve do not follow a known distribution, or the distribution is known not to be a normal distribution. For instance, we might only have two products, one that costs $10 and one that costs $100.

The reason that I'm restating the problem is because a term such as ROI (return on investment) gets thrown around a lot. In some cases, it could mean the current value of discounted future cash flows. Here, though, I think it simply means the dollar amount that customers spend (or invest, or donate, or whatever depending on the particular business).

The overall approach is that we want to measure the average and standard error for each of the groups. Then, we'll apply a simple "standard error" of the difference to see if the difference is consistently positive or negative. This is a very typical use of a z-score. And, it is a topic that I discuss in more detail in Chapter 3 of my book "Data Analysis Using SQL and Excel". In fact, the example here is slightly modified from the example in the book.

A good place to start is the Central Limit Theorem. This is a fundamental theorem for statistics. Assume that I have a population of things -- such as customers who are going to spend money in response to a marketing campaign. Assume that I take a sample of these customers and measure an average over the sample. Well, as I take more an more samples, the distribution of the averages follows a normal distribution regardless of the original distribution of values. (This is a slight oversimplification of the Central Limit Theorem, but it captures the important ideas.)

In addition, I can measure the relationship between the characteristics of the overall population and the characteristics of the sample:

(1) The average of the sample is as good an approximation as any of the average of the overall population.

(2) The standard error on the average of the sample is the standard deviation of the overall population divided by the square root of the size of the sample. Alternatively, we can phrase this in terms of variance: the variance of the sample average is the variance of the population average divided by the size of the sample.

Well, we are close. We know the average of each sample, because we can measure the average. If we knew the standard deviation of the overall population, then we could get the standard error for each group. Then, we'd know the standard error and we would be done. Well, it turns out that:

(3) The standard deviation of the sample is as good an approximation as any for the standard deviation of the population. This is convenient!

Let's assume that we have the following scenario.

Our test group has 17,839 customers, and the overall average purchase is $85.48. The control group has 53,537 customers, and the average purchase is $70.14. Is this statistically different?

We need some additional information, namely the standard deviation for each group. For the test group, the standard deviation is $197.23. For the control group, it is $196.67.

The standard error for the two groups is then $197.23/sqrt(17,839) and $196.67/sqrt(53,537), which comes to $1.48 and $0.85, respectively.

So, now the question is: is the difference of the means ($85.48 - $70.14 = $15.34) significantly different from zero. We need another formula from statistics to calculate the standard error of the difference. This formula says that the standard error is the square root of the sums of the squares of standard errors. So the value is $1.71 = sqrt(0.85^2 + 1.48^2).

And we have arrived at a place where we can use the z-score. The difference of $15.34 is about 9 standard deviations from 0 (that is, 9*1.71 is about 15.34). It is highly, highly, highly unlikely that the difference includes 0, so we can say that the test group is significantly better than the control group.

In short, we can apply the concepts of normal distributions, even to calculations on dollar amounts. We do need to be careful and pay attention to what we are doing, but the Central Limit Theorem makes this possible. If you are interested in this subject, I do strongly recommend Data Analysis Using SQL and Excel, particularly Chapter 3.

--gordon

Labels: , , ,