Web Appendix: Model Estimation

1
The Long-term Effect of Marketing Strategy on Brand Sales
M. Berk Ataman, Harald J. van Heerde, and Carl F. Mela
WEB APPENDIX
MODEL ESTIMATION
For a given brand j ( j = 1,,J) Equations (2)-(6) can be combined in a single model and written
as
(W.1a) Yt F1t 1t F2t 2 t ,
(W.1b) 1t G1t 1 h t t .
In Equations (W.1a) and (W.1b) Yt is a (3S+K)1 vector of dependent variables including log
sales, log regular price, log price index, and K (= 4 in the application) marketing mix variables.
F1t is a (3S+K)N matrix of regressors where N (= 7 in the application) is the number of
explanatory variables with a time varying parameter (brand and store intercepts, log regular
price, and advertising). 1t is a N1 vector of brand specific time varying parameters, t is a
(3S+K)1 vector of observation equation errors. F2t is a (3S+K)M matrix of regressors with
non-time varying parameters, kept in 2 vector of size M1. G (= diag([))
is a NN matrix defining system evolution, and t is a N1 vector of system errors. The N1
vector ht = + Z't-1 includes the lagged marketing mix and the system equation intercepts. Both
Yt and t have multivariate normal distributions, and so do the associated error terms. We
assume that t ~ N(0,V), where the variance matrix V, of size (3S+K) (3S+K), is time invariant
and full. Note that we correlate sales, regular price, promotional price and marketing mix error
terms within each brand. This allows us to capture unobserved shocks that may cause
2
endogeneity. The system errors are distributed multivariate normal, t ~ N(0,W), where W is a
diagonal matrix of size NN.
We place normal priors on all parameters of the Equations (W.1a) and (W.1b). The
evolution equation (W.1b) error covariance matrix is assumed to be diagonal and we place an
Inverse Gamma prior on their diagonal elements. As we allow for correlation between the
observation equation error terms and the marketing mix equation error terms (W.1a), the
associated error covariance matrix is full. Therefore we place an Inverse Wishart prior. Given
these priors the estimation is carried out using DLM updating within a Gibbs sampler.
Conditional on 2, V, W, G, and ht the time varying parameters (1t) are obtained via the
forward filtering backward sampling procedure (Carter and Kohn 1994, Frhwirth-Schnatter
1994). The long-term marketing mix effects () are estimated using a random walk Metropolis-
Hastings algorithm. Next, we derive the full conditional posterior distributions used in the
sampling chain.
First, define Yt = [Y'1t Y'2t]' such that Y1t includes log sales of the focal brand, and Y2t
includes the rest (log regular price, log price index, and K marketing mix variables). Also define
2 = ['21 '22]', and F2t = diag([F21t F22t]), where 21 and 22 contain non-time varying
parameters from the sales equation and remaining equations, respectively. As Y1t and Y2t are
jointly normally distributed,
Y1t F1t 1t F21t 21 V11 V12

(W.2) Y ~ , ,
2t F22t 22 V21 V22
~
the conditional covariance matrix is given by V V11 V12 V221 V21 , and the conditional mean
vector (net off sales attributed to the variables with non-time varying parameters) is given by
~
Y1t Y1t V12 V221 (Y2t F22t 22 ) - F21t 21 .
3
Assuming that the DLM is closed to external information at times t 1 -i.e. given initial
information D0 at t = 0, at any future time t the available information set is simply

~
D t {Y1t , D t 1 } , and D0 includes all values of V, W, G, and ht and 10 | D 0 ~ N(m 0 , C 0 ) .
Conditional on these parameters the solution is given by West and Harrison (1997). Prior at time
t is 1t | Dt-1 ~ N(at, Rt), where the mean and the covariance matrix are at = Gmt-1 + ht and Rt =
~
GCt-1G' + W. One-step ahead forecast at time t is Y1t |Dt-1 ~ N(ft, Qt), where ft = Ftat and Qt =
~
FtRtF't + V . Then the posterior distribution at time t is 1t | Dt ~ N(mt, Ct), where
~
m t a t R t FtQ t 1 (Y1t f t ) , and C t R t R t FtQ t 1Ft R t .
Step 1: 1t|rest
In order to sample from the conditional distribution of 1t for each brand j, we adopt the forward
filtering, backward sampling algorithm proposed by Carter and Kohn (1994) and Frhwirth-
Schnatter (1994). The sampling of system parameters starts with the standard DLM updating.
For t = 1,,T we apply forward filtering to obtain the moments, mt and Ct. At t = T we sample a
vector of system parameters from the distribution N(mt, Ct), then we sequence backwards for t =
T-1, , 1 sampling from p(1t | 1t+1 , ) ~ N(q*t, Q*t), where q*t = mt + Bt (1t+1 at+1), Q*t = Ct
BtRt+1B't, and B t C t G R t 11 . For the starting values of time varying parameters, we use m0 =
[0 -2 0 0 0 0 0], and set the initial variance C0 to IN.
Step 2: V|rest
For each brand j, we allow for correlations between all error terms and place an Inverse Wishart
prior on the error covariance matrix. We use a diffuse prior for V that has a prior mean-diagonal
element SV0 = .001I(3S+K) and set the prior degrees of freedom nV0 to (3S+K)+2. Then the full
conditional posterior distribution has degrees of freedom nV1 = nV0 + T with a variance matrix
4
given by S V1 S V0 t 1 (Yt F1t 1t - F2t 2 )(Yt F1t 1t - F2t 2 )' .

T
Step 3: 2|rest
In order to obtain the conditional posterior distribution of the non-time varying parameters for
each brand j we define Y*t = Yt F1t1t and VT = VIT. We place a diffuse Normal prior on the
parameters, 2 ~ N(2,2), where 2 = 0 and 2 = 1000IM. Then the full conditional
1
posterior is 2 ~ N( 2 2 ) , where 2 2 { 2 2 [F2t VT1 Yt* ]} , and
1
2 { 2 [F2t VT1 F' 2t ]}1 .
Step 4: W|rest
For each brand j, we assume that the system equation error covariance matrix is diagonal, and
place an Inverse Gamma prior on the diagonal elements of this matrix, with nW0/2 degrees of
freedom and a scale parameter of SW0/2. The full conditional posterior distribution is also
distributed Inverse Gamma with nW1 = nW0 + T 1 and S W1 S W0 t 1 ( 1t G 1t 1 h t ) 2 .

T
We use a diffuse prior with nW0 = 3 and SW0 =.001.
Step 5: |rest
In this step we derive the full conditional posteriors of decay parameters for each brand j and
system equation i. We place a Normal prior on all parameters, ij ~ N(ij,ij), where ij = 0
and ij = 1000. We first stack the observations 1it across time in vectors 1iT and 1iT-1,
running from t = 2,,T and t = 1,,T-1 respectively. We also stack the corresponding
components of ht in hiT. Then for each i we define yiT 1iT - hiT and WiT = WiIT-1. Given the
normal priors, and the likelihoods, the full conditional posterior distributions are
1 1
WiT1 y iT ]} and ij { ij [1iT
ij ~ N( ij ij ) , where ij ij { ij ij [1iT WiT1 1iT ]}1 .
5
Step 6: |rest
In this step we derive the full conditional posteriors of intercepts for each brand j and system
equation i (i = 1 for intercept, i = 2 for elasticity). We place a Normal prior on all parameters, ij
~ N(ij,ij), where ij = 0 and ij = 1000. We stack the observations 1it across time in vectors
1iT and 1iT-1, running from t = 2,,T and t = 1,,T-1, respectively. We also stack the
corresponding components of ht in hiT. Then for each i we define yiT 1iT - ij1iT-1 - hiT and
WiT = WiIT-1. Given the normal priors, and the likelihoods, the full conditional posterior
1
distributions are ij ~ N( ij ij ) , where ij ij { ij ij [1 WiT1 y iT ]} and
1
ij { ij [1 WiT11]}1 .
Step 7: |rest
We use a random walk Metropolis-Hastings step within the Gibbs sampler to obtain each
marketing mix coefficient in the two system equations. We generate the candidate rate draw by
(m) = (m-1) + z, where (m) denotes mth iteration, and z is a random draw from N(0, I). We
select such that the acceptance rate is between 20%-50% (Chib and Greenberg 1995). The
candidate draw is accepted with the probability *=min{1,}, where

L (m) | 1 , W, , p (m) | ,
| , W, , p ,
(W.3)

L (m 1) 1
(m 1)
| ,
L(()|) is conditional likelihood of Equation (W.2), and p(()|) is the prior density evaluated at
each (). We set = 0 and = 1000.

6
REFERENCES
Carter, C. and R. Kohn (1994), On Gibbs Sampling for State Space Models, Biometrika, 81
(3), 54153.
Chib, Siddhartha and Edward Greenberg (1995), Understanding the Metropolis-Hastings
Algorithm, American Statistician, 49 (4), 32735.
Frhwirth-Schnatter, Sylvia (1994), Data Augmentation and Dynamic Linear Models, Journal
of Time Series Analysis, 15 (2), 183202.
West, Mike and Jeff Harrison (1997), Bayesian Forecasting and Dynamic Models, 2d ed. New
York: Springer-Verlag.

Web Appendix: Model Estimation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Web Appendix: Model Estimation

Uploaded by

Copyright:

Available Formats

1

The Long-term Effect of Marketing Strategy on Brand Sales

M. Berk Ataman, Harald J. van Heerde, and Carl F. Mela

(W.1a) Yt F1t 1t F2t 2 t ,

F1t is a (3S+K)N matrix of regressors where N (= 7 in the application) is the number of

price, and advertising). 1t is a N1 vector of brand specific time varying parameters, t is a

non-time varying parameters, kept in 2 vector of size M1. G (= diag([))

is a NN matrix defining system evolution, and t is a N1 vector of system errors. The N1

diagonal matrix of size NN.

jointly normally distributed,

Y1t F1t 1t F21t 21 V11 V12

information D0 at t = 0, at any future time t the available information set is simply

[0 -2 0 0 0 0 0], and set the initial variance C0 to IN.

given by S V1 S V0 t 1 (Yt F1t 1t - F2t 2 )(Yt F1t 1t - F2t 2 )' .

parameters, 2 ~ N(2,2), where 2 = 0 and 2 = 1000IM. Then the full conditional

distributed Inverse Gamma with nW1 = nW0 + T 1 and S W1 S W0 t 1 ( 1t G 1t 1 h t ) 2 .

We use a diffuse prior with nW0 = 3 and SW0 =.001.

system equation i. We place a Normal prior on all parameters, ij ~ N(ij,ij), where ij = 0

candidate draw is accepted with the probability *=min{1,}, where

each (). We set = 0 and = 1000.

You might also like