You are on page 1of 6

1

The Long-term Effect of Marketing Strategy on Brand Sales

M. Berk Ataman, Harald J. van Heerde, and Carl F. Mela

WEB APPENDIX

MODEL ESTIMATION

For a given brand j ( j = 1,,J) Equations (2)-(6) can be combined in a single model and written

as

(W.1a) Yt F1t 1t F2t 2 t ,

(W.1b) 1t G1t 1 h t t .

In Equations (W.1a) and (W.1b) Yt is a (3S+K)1 vector of dependent variables including log

sales, log regular price, log price index, and K (= 4 in the application) marketing mix variables.

F1t is a (3S+K)N matrix of regressors where N (= 7 in the application) is the number of

explanatory variables with a time varying parameter (brand and store intercepts, log regular

price, and advertising). 1t is a N1 vector of brand specific time varying parameters, t is a

(3S+K)1 vector of observation equation errors. F2t is a (3S+K)M matrix of regressors with

non-time varying parameters, kept in 2 vector of size M1. G (= diag([))

is a NN matrix defining system evolution, and t is a N1 vector of system errors. The N1

vector ht = + Z't-1 includes the lagged marketing mix and the system equation intercepts. Both

Yt and t have multivariate normal distributions, and so do the associated error terms. We

assume that t ~ N(0,V), where the variance matrix V, of size (3S+K) (3S+K), is time invariant

and full. Note that we correlate sales, regular price, promotional price and marketing mix error

terms within each brand. This allows us to capture unobserved shocks that may cause
2

endogeneity. The system errors are distributed multivariate normal, t ~ N(0,W), where W is a

diagonal matrix of size NN.

We place normal priors on all parameters of the Equations (W.1a) and (W.1b). The

evolution equation (W.1b) error covariance matrix is assumed to be diagonal and we place an

Inverse Gamma prior on their diagonal elements. As we allow for correlation between the

observation equation error terms and the marketing mix equation error terms (W.1a), the

associated error covariance matrix is full. Therefore we place an Inverse Wishart prior. Given

these priors the estimation is carried out using DLM updating within a Gibbs sampler.

Conditional on 2, V, W, G, and ht the time varying parameters (1t) are obtained via the

forward filtering backward sampling procedure (Carter and Kohn 1994, Frhwirth-Schnatter

1994). The long-term marketing mix effects () are estimated using a random walk Metropolis-

Hastings algorithm. Next, we derive the full conditional posterior distributions used in the

sampling chain.

First, define Yt = [Y'1t Y'2t]' such that Y1t includes log sales of the focal brand, and Y2t

includes the rest (log regular price, log price index, and K marketing mix variables). Also define

2 = ['21 '22]', and F2t = diag([F21t F22t]), where 21 and 22 contain non-time varying

parameters from the sales equation and remaining equations, respectively. As Y1t and Y2t are

jointly normally distributed,

Y1t F1t 1t F21t 21 V11 V12


(W.2) Y ~ , ,
2t F22t 22 V21 V22

~
the conditional covariance matrix is given by V V11 V12 V221 V21 , and the conditional mean

vector (net off sales attributed to the variables with non-time varying parameters) is given by
~
Y1t Y1t V12 V221 (Y2t F22t 22 ) - F21t 21 .
3

Assuming that the DLM is closed to external information at times t 1 -i.e. given initial

information D0 at t = 0, at any future time t the available information set is simply


~
D t {Y1t , D t 1 } , and D0 includes all values of V, W, G, and ht and 10 | D 0 ~ N(m 0 , C 0 ) .

Conditional on these parameters the solution is given by West and Harrison (1997). Prior at time

t is 1t | Dt-1 ~ N(at, Rt), where the mean and the covariance matrix are at = Gmt-1 + ht and Rt =

~
GCt-1G' + W. One-step ahead forecast at time t is Y1t |Dt-1 ~ N(ft, Qt), where ft = Ftat and Qt =

~
FtRtF't + V . Then the posterior distribution at time t is 1t | Dt ~ N(mt, Ct), where

~
m t a t R t FtQ t 1 (Y1t f t ) , and C t R t R t FtQ t 1Ft R t .

Step 1: 1t|rest

In order to sample from the conditional distribution of 1t for each brand j, we adopt the forward

filtering, backward sampling algorithm proposed by Carter and Kohn (1994) and Frhwirth-

Schnatter (1994). The sampling of system parameters starts with the standard DLM updating.

For t = 1,,T we apply forward filtering to obtain the moments, mt and Ct. At t = T we sample a

vector of system parameters from the distribution N(mt, Ct), then we sequence backwards for t =

T-1, , 1 sampling from p(1t | 1t+1 , ) ~ N(q*t, Q*t), where q*t = mt + Bt (1t+1 at+1), Q*t = Ct

BtRt+1B't, and B t C t G R t 11 . For the starting values of time varying parameters, we use m0 =

[0 -2 0 0 0 0 0], and set the initial variance C0 to IN.

Step 2: V|rest

For each brand j, we allow for correlations between all error terms and place an Inverse Wishart

prior on the error covariance matrix. We use a diffuse prior for V that has a prior mean-diagonal

element SV0 = .001I(3S+K) and set the prior degrees of freedom nV0 to (3S+K)+2. Then the full

conditional posterior distribution has degrees of freedom nV1 = nV0 + T with a variance matrix
4

given by S V1 S V0 t 1 (Yt F1t 1t - F2t 2 )(Yt F1t 1t - F2t 2 )' .


T

Step 3: 2|rest

In order to obtain the conditional posterior distribution of the non-time varying parameters for

each brand j we define Y*t = Yt F1t1t and VT = VIT. We place a diffuse Normal prior on the

parameters, 2 ~ N(2,2), where 2 = 0 and 2 = 1000IM. Then the full conditional

1
posterior is 2 ~ N( 2 2 ) , where 2 2 { 2 2 [F2t VT1 Yt* ]} , and

1
2 { 2 [F2t VT1 F' 2t ]}1 .

Step 4: W|rest

For each brand j, we assume that the system equation error covariance matrix is diagonal, and

place an Inverse Gamma prior on the diagonal elements of this matrix, with nW0/2 degrees of

freedom and a scale parameter of SW0/2. The full conditional posterior distribution is also

distributed Inverse Gamma with nW1 = nW0 + T 1 and S W1 S W0 t 1 ( 1t G 1t 1 h t ) 2 .


T

We use a diffuse prior with nW0 = 3 and SW0 =.001.

Step 5: |rest

In this step we derive the full conditional posteriors of decay parameters for each brand j and

system equation i. We place a Normal prior on all parameters, ij ~ N(ij,ij), where ij = 0

and ij = 1000. We first stack the observations 1it across time in vectors 1iT and 1iT-1,

running from t = 2,,T and t = 1,,T-1 respectively. We also stack the corresponding

components of ht in hiT. Then for each i we define yiT 1iT - hiT and WiT = WiIT-1. Given the

normal priors, and the likelihoods, the full conditional posterior distributions are

1 1
WiT1 y iT ]} and ij { ij [1iT
ij ~ N( ij ij ) , where ij ij { ij ij [1iT WiT1 1iT ]}1 .
5

Step 6: |rest

In this step we derive the full conditional posteriors of intercepts for each brand j and system

equation i (i = 1 for intercept, i = 2 for elasticity). We place a Normal prior on all parameters, ij

~ N(ij,ij), where ij = 0 and ij = 1000. We stack the observations 1it across time in vectors

1iT and 1iT-1, running from t = 2,,T and t = 1,,T-1, respectively. We also stack the

corresponding components of ht in hiT. Then for each i we define yiT 1iT - ij1iT-1 - hiT and

WiT = WiIT-1. Given the normal priors, and the likelihoods, the full conditional posterior

1
distributions are ij ~ N( ij ij ) , where ij ij { ij ij [1 WiT1 y iT ]} and

1
ij { ij [1 WiT11]}1 .

Step 7: |rest

We use a random walk Metropolis-Hastings step within the Gibbs sampler to obtain each

marketing mix coefficient in the two system equations. We generate the candidate rate draw by

(m) = (m-1) + z, where (m) denotes mth iteration, and z is a random draw from N(0, I). We

select such that the acceptance rate is between 20%-50% (Chib and Greenberg 1995). The

candidate draw is accepted with the probability *=min{1,}, where


L (m) | 1 , W, , p (m) | ,
| , W, , p ,
(W.3)

L (m 1) 1
(m 1)
| ,

L(()|) is conditional likelihood of Equation (W.2), and p(()|) is the prior density evaluated at

each (). We set = 0 and = 1000.


6

REFERENCES
Carter, C. and R. Kohn (1994), On Gibbs Sampling for State Space Models, Biometrika, 81
(3), 54153.
Chib, Siddhartha and Edward Greenberg (1995), Understanding the Metropolis-Hastings
Algorithm, American Statistician, 49 (4), 32735.
Frhwirth-Schnatter, Sylvia (1994), Data Augmentation and Dynamic Linear Models, Journal
of Time Series Analysis, 15 (2), 183202.
West, Mike and Jeff Harrison (1997), Bayesian Forecasting and Dynamic Models, 2d ed. New
York: Springer-Verlag.

You might also like