A good methodology to use for this kind of study is matched pairs. This allows you to
isolate the effects of a single variable while controlling for many others. The
idea is simple: To measure the effect of a treatment, you take pairs of
subjects who are similar in every way and give the treatment to one, but not
the other. In medical studies, twins come in handy for this purpose.
To simplify slightly, at TripAdvisor we have two ways to
generate revenue from the millions of travelers who come to one of our sites to
read reviews: they can click on link A to be taken to an on-line travel agency
which pays us for the referral or they can click on link B to be taken directly
to the site of a hotel that has subscribed to our business listing product. So
the question is “Does the presence of link B have an effect on the number of
clicks received by link A?” To answer this question, each property with a
business listing is paired with a “twin” that does not have a business listing.
The result is two cohorts with extremely similar distributions of average daily
rate, number of reviews, amount of traffic on review page, number of rooms, and
everything else I could think of that might influence clicks on link A. Since
the only consistent difference between the cohorts is the presence or absence
of link B, any statistically significant difference in Link-A clicks can be
attributed to the presence of the business listing.
Why not just
compare a random sample of hotels with links A and B with a random sample of
hotels with only link A? Such a
comparison would be very flattering to link B; on average, hotels with a
business listings subscription perform better than those without one on all
kinds of metrics including clicks on link A. This is not surprising. Business
listings do not appeal to all properties equally, nor have they been marketed
with equal vigor in all markets and market segments. Such a study cannot
distinguish between a difference caused
by link B and one that is merely correlated
with link B. For example, perhaps link B appeals more to hotels in high-traffic
destinations and those same properties also attract more clicks of all kinds
Why not do a
longitudinal study? The goal would be to compare the click rate before and
after link B goes live on a hotel’s review page. The problem with this approach
is that though the change in click rate is easy to measure, it is hard to
interpret. The quantity of clicks varies over time for all sorts of reasons
that have nothing to do with the presence or absence of a business listing. In
addition to seasonality, there is trend: The ever increasing number of
TripAdvisor users means that clicks will tend to increase over time. Add to
that the effects of marketing campaigns, competition, changing exchange rates,
and political factors and there is a lot of noise obscuring whatever signal is
in the data. A cross-sectional study controls for all that.
How is similarity
measured? The matched pairs
methodology calls for each subscriber to be paired with the non-subscriber most
similar to it. For this study, there is a list of features that must match
exactly and another list of features which, as a group, must be “pretty close.”
The exact match features are categorical. The pretty close features are
numeric.
Exact match
features
·
Same price business listing.
·
Same geographic region.
·
Same category (Hotel, B&B, Specialty
Lodging).
·
Same chain status (a Hilton can match a
Marriott, but neither can match an independent property).
·
Matching properties are both on the first page
of listings for a destination or both on some other page.
·
Presence or absence of reviews supplied by our
users.
Hotels that match on all of the
above are candidates for matching. A hotel’s actual match is its closest neighbor as determined by the “pretty close” features. The exact match features control
for many variables that are not mentioned explicitly. For example, the price
charged for a business listing depends on the popularity of the destination and
the size of the property so hotels in the same pricing slice are similar in
size and traffic. Matching on geography controls for currency, climate,
language, and much else.
Pretty close
features
·
Average daily rate.
·
Number of rooms.
·
Popularity ranking.
·
Review page views.
The values of these features place each property at a
point in a four-dimensional space so it is easy to calculate the Euclidean
distance between any pair of properties. The closest candidate by Euclidean
distance is picked as the match. Because the features are all measured on
different scales, they must first be standardized to make distance along one
dimension comparable to distance along any other.
A few pairs are so well matched that, according to this
measure, they are distance 0 from each other.
The hotels on the left have business listings. The ones on
the right are their twins without business listings. Podere Perelli and Agriturismo
il Borghetto are twins because each has 12 rooms, each got exactly 72 page
views during the observation period, and each is seventh on its page.
The results
Deciding on the distance metric and creating the matched
pairs was most of the work. Once I had the pairings, I loaded 36,000 closely
matched pairs into JMP, a data exploration and analysis tool that includes a
matched pairs module.
In the diamond-shaped chart, the horizontal axis
represents increasing number of clicks on link A (“commerce clicks” in the
figure). To the left, where the number of clicks is low, there are some dots
below the red line indicating pairs where the non-subscriber got more link-A
clicks, but as the number of clicks increases, the business listings subscriber
nearly always wins.
In conclusion, after controlling for differences due to
geography, traffic, popularity, hotel category, number of rooms, presence or
absence of reviews, appearance on page one, and average daily rate, we counted
the number of clicks each twin received during a fixed observation period. There
was a statistically significant difference in the number of clicks on link A.
The average number of clicks for business listing subscribers was 597.49. The
average number for non-subscribers was 411.69. This is good news for our
subscribing hoteliers: In addition to the traffic we drive directly to their
sites, they see increased indirect traffic as well.