You are on page 1of 19

Workshop Overview

• Module 1: Data

• Module 2: Model and Estimation

• Module 3: Sample Output and Empirical Generalization


Outline
• Ideal Data for Promotion/Pricing Analytics: Scanner
Data (in CPG)

• Data and Inference: What Can Go Wrong?

• Challenges and Common Mistakes in Consumer


Electronics

• Data Requirement and Potential Data Source


Scanner Data (Store Audit Data)
How is Data Collected?
• Syndicated data providers: IRI and A.C. Nielsen
• Sample of stores (Grocery, Drug, Convenience, Mass
Merchandiser, Warehouse stores)
• Scanner data
– UPC info (product features), (Retail) price, Quantity (Volume)
all recorded
• Features
– Centrally collected and coded (daily)
• Displays
– Collected by store auditors (1X/week)
4 Data Dimensions

• The Data Cube


– Geography (Market) x Product x Time x Variable (Measure)
– G x P x T x V > 1,000,000 even for one category

• Aggregation (chain/regions, SKU groups, temporal)


Scanner Data: Advantages

• Completeness
– Linking aggregate sales movements to marketing instruments
(price, feature, display, etc)
– Obtaining a richer set of performance measures beyond market
share and factory shipments

• Timeliness
– Getting the data within a window that allows for meaningful
managerial action (i.e. less than old lag time of 8 weeks or
more)

• Accuracy
Scanner Data: Limitations

• Not a complete sampling frame: excluded stores


– Small shops, Walmart!

• Hard to make causal statements without careful modeling: non-


random assignment

• No information on consumer behaviors before purchases (e.g.


search, consideration) and consumption after purchases

• No information on psychographics
Promotion Analytics from Scanner Data

• A simplistic picture

8%
Purchase Purchase
Deceleration Acceleration

5%
4.8 %
4.5 %

Net Effect = (8 - 5) - .2 - .5 = 2.3%

1 2 3 4 5 6 7 8
Week
Promotion Week
Promotions: Actual data

1 1

0.8
Market Share

0.75

Price
0.6
0.5
0.4

0.25
0.2 F F F F F F F
D D D D D D D
C C C C
5 10 15 20 25 30
Week
F = Feature, D = Display, C = Store Coupon
Promotion Types

(End of Aisle) Price-cut


Feature Coupon
Display (BOGO)
1. Size of Data Information in Data
• Consider the following two options:

(1) Wal-Mart with 4,000 stores, 52 weeks of data, 500 SKUs (104 million observations!)

(2) Best-buy with 1,500 stores, 52 weeks of data, 500 SKUs (39 million observations)

• Which dataset would be more useful to measure price responses?

Wal-Mart (EDLP) Best-Buy (Hi-Lo)


25 25

20 20

15 15

10 10

5 5
0
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
P1 P2 P3
P1 P2 P3
2. Pay Attention to Signal-to-Noise Ratio
• Consider the following measurement. Is there significant impact from marketing event?

Revenue Before Event After Event % Change

Average 10 13 30

• Well, it depends on signal-to-noise ratio!

Revenue before/after Event Revenue before/after event


16 120
14
100
12
80
10
8 60
6
40
4
20
2
0 0
0 20 40 60 80 100 120 0 20 40 60 80 100 120
3. Careful about Reverse Causality
• Imagine the following data generating process.
– 𝑆𝑎𝑙𝑒𝑠𝑚,𝑡 = 𝐼𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡𝑚 + 0 ∙ 𝐴𝑑𝑣𝑚,𝑡 + 𝜀𝑚,𝑡
– 𝐴𝑑𝑚,𝑡 = 0.1 ∙ 𝑆𝑎𝑙𝑒𝑠𝑚,𝑡−1 + 𝜔𝑚,𝑡

• If you do analysis ignoring the reverse causality, you may conclude the following.

Sales (m,t) vs. Adv (m,t)


60
Coefficients Standard Error P-value
50
Intercept -3.357941562 1.121241507 0.004875
40
Sales (m,t)

30
Advertising (t) 9.716546286 0.354635984 3.58E-26

20

10
Significant impact of advertising?
0
0 1 2 3 4 5 6
Adv (m,t)

R square: 0.95 -> Good fit!


4. Omitted Variables Can Be Dangerous
• Oftentimes, we don’t have data on some important variables, which can impact sales,
revenue, or profits.
– Doing analytics ignoring these “omitted variables” can lead to “biased” estimates of
marketing mix effects.

• Think about the graph below (from NYT). Is the family income really responsible for better
academic achievement? What would be potential omitted variable bias here?
5. Selection by Outcome: Bad Idea!
• Problem: Often times, two groups, which are conditioned by outcome variables, are
compared to infer the causal impact of marketing mix

• Example
– To calculate the ROI of paid search campaign, advertisers compare the “conversion
rates” of each “search” keyword. Usually, branded keywords are shown to have high
conversion rates (> 6%) compared to generic keywords (~ 1%).

• How to fix the problems?


– Use proper “control” condition!
– In the paid search example, all the traffics/conversions from consumers who click on
branded keywords are attributed to the resulting sales and profit. An implicit
assumption here is that all the sales/profits are lost without paid search. Really?
– It’s possible that consumers who use branded keywords are already quite committed to
purchase, and they may simply substitute to unpaid (organic) search links if paid
searches are turned off.
– A proper control in this case is “halting selected search engine marketing keywords”
Key Challenges
• There are no syndicated data providers such as IRI and Nielsen in Consumer
Electronics

• Slightly better situation in North America or Europe


– NPD (U.S.), GfK (Europe) provides market (or retail channel) level unit sales
and price data by SKUs
– However, they do not provide promotion details
– Even with promotion data, the use of market (or channel) level data can cause
aggregation bias (i.e. overestimation of promotion effects)

• You have to assemble multiple datasets on your own


– At least 2 ~ 3 datasets need to be merged
– SKU-level unit sales data from ERP + External tracking service data (on price
and promotion): half-blind (no sales info for competitors)
– Better data access if you are a category captain
– Most painful and time-consuming step: organizational silo
Common Mistakes: For Discussion
• Use factory shipment data instead of retail sales data
– Biased promotion effect estimates due to forward buying from retailers

• Use cross-sectional data to measure price/promotion effects


– Biased price or promotion effect estimates due to omitted variable bias
– Better to use panel data and identify effects from within-store (or within chain) variation

• Use market (or channel) level data


– Promotion effects are not homogeneous within a market (or channel)
– Due to aggregation bias, promotion effects will be overstated
– Better to use store, account, or chain-level data where promotion activities do not vary
across units

• Use data with short history (1 year or less)


– At least, 2 ~ 3 years of data are required to properly measure seasonality

• Ignore price changes and promotion from competitors


– Biased estimates of baseline sales and price/promotion effects
Consumer Sales vs. Factory Shipments
Promotion
100,000

80,000
Shipment

Retail sales
60,000

40,000

20,000
1978 1979 1980 1981 1982
Data Requirement
• Key elements of data
– Unit sales by SKUs (outcome): ideally for the entire category (including competitors), but
feasible only with data for focal company‘s own SKUs
– Price measures by SKU(causal): focal company + competitors
– Promotion measures by SKU/product line/brand (causal): focal company + competitors

• Duration
– Ideally 3 years (of weekly data); At least 2 years of data
– To properly control seasonality

• Level of aggregation
– Ideally store-level data; chain or account (chain-market combination) data can be used
as long as promotion/price policies are uniform (within chain or account)
– Using market or channel-level data can cause overstating of promotion effects due to
aggregation bias

• Type of response data: Retail sales data (Do not use factory shipment data)
– Due to forward buying from retailers
Potential Data Source: For Discussion
• Key elements of data

– Unit sales by SKUs (outcome)

– Price measures by SKU(causal)

– Promotion measures by SKU/product line/brand (causal)

You might also like