You are on page 1of 8

1 INTUITION AND A PROBABILISTIC VIEW

Multiperiod Portfolio Selection


and Bayesian Dynamic Models
Techniques inspired by Bayesian statistics ing to purely quadratic impact-related trading costs).
provide an elegant solution to the classic in- A third problem, related to the first two, is the prac-
ticality of hedging derivative contracts when trading cost
vestment problem of optimally planning a se-
of dynamic offsetting replicating portfolios is taken into
quence of trades in the presence of transaction account. This problem is routinely faced by the office of
costs, according to Petter Kolm1 and Gordon the CIO at an investment bank, who must balance risk
Ritter1 . with the cost of trading a large hedging position.
In this paper we present a general framework which
encompasses all of these types of problems, and which
Planning a sequence of trades extending into the fu- establishes an intuitively appealing link to the theory of
ture is a very common problem in finance. Merton (1969) Bayesian statistics. Intuition is most valuable when it
and Samuelson (1969) considered agents seeking decisions is also useful, however, and perhaps the best feature of
which maximize total anticipated utility over time, a de- our framework is that intuition leads to a straightforward
parture from the one-period portfolio selection theories of algorithm for solving the problem. This algorithm is es-
the time. pecially useful in the realistic case when market impact
All trading is costly, and the need for intertemporal is nonlinear and overall trading cost may not even be
optimization is more acute when trading costs are con- differentiable, and when various real-world portfolio con-
sidered. The total cost due to market impact is known straints are present. We plan to provide more techni-
to be superlinear as a function of the trade size (Almgren cal details and numerical examples in a companion paper
et al. (2005) measured an exponent of about 0.6 for im- (Kolm and Ritter, 2014).
pact itself, hence 1.6 for total cost), implying that a large
order may be more efficiently executed as a sequence of
1 Intuition and a Probabilistic View
small orders. Indeed, optimal liquidation paths had al-
ready been studied by Almgren and Chriss (1999) under
We now place ourselves into the position of a rational
an idealized linear impact model, leading to quadratic to-
agent planning a sequence of trades beginning presently
tal cost.
and extending into the future. We wish to understand,
A similar, but more complex problem is faced by the and ultimately maximize, the agent’s utility function. We
discretionary trader, who can set the time horizon and can conceptualize utility in terms of decisions and out-
who can wait to deploy an alpha strategy until there is comes. Conditional on a decision d, the probability of
a trading path with favorable expected utility. Further, outcome w is p(w | d). A decision d is chosen by maximiz-
the drivers of demand for trading may differ vastly at ing E[U (w) | d] where U (w) quantifies the agent’s utility
different horizons. A simple example is when one is fun- associated to consequence w.
damentally bullish on a stock, but also anticipates that
In trading problems, decisions are typically modeled
near-term negative sentiment may push the value down
as the decision to hold a specific portfolio sequence x =
further. Transaction costs may render a “round-trip” of
(x1 , x2 , . . . , xT ), where xt is the portfolio the agent plans
shorting and subsequently covering to go long not worth-
to hold at time t in the future. Often the relevant out-
while. Decisions of this sort are faced routinely by both
come is the trading profit. If rt+1 is the vector of asset
quantitative and fundamental traders.
returns over [t, t+1], then the profit associated to decision
Disagreement among alpha models defined at various
d = x = (x1 , x2 , . . . , xT ) is
horizons is, in fact, commonplace in quantitative trad-
ing. Garleanu and Pedersen (2009) studied the mul- π(x) =
X
[xt · rt+1 − ct (xt−1 , xt )] (1)
tiperiod quantitative-trading problem under the some- t
what restrictive assumptions that the alpha models fol-
low mean-reverting dynamics and that the only source of where ct (xt−1 , xt ) is the total cost (including but not lim-
trading frictions are purely linear market impacts (lead- ited to market impact, spread pay, borrow costs, ticket
1 Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, New York, NY 10012

Original version: July 27, 2014. 1 This version: September 22, 2014
1 INTUITION AND A PROBABILISTIC VIEW

charges, financing, etc.) associated with holding portfo- All studies of portfolio theory which do not explicitly in-
lio xt−1 at time t − 1 and ending up with xt at time t. clude transaction costs are, in effect, studies of an ideal
Trading profit π(x) is a random variable, since many trader. Inclusion of transaction costs into such a study
of its components are future quantities unknowable at amounts to optimally tracking the portfolio sequence of
time t = 0. Final wealth w must equal initial wealth plus the ideal trader, as we shall see.
the trading profit, w = w0 + π(x). Consider the standard The possibility of constructing the random trader as
CARA utility function U (w) = −e−γw , where γ > 0 is a theoretical device cannot be questioned, even if its use-
the Arrow-Pratt index of absolute risk aversion. If π(x) fulness is not obvious yet. Indeed, we can define for a
is normally distributed, then U (w) is negative-lognormal. constant κ > 0,
It follows that maximization of E[U (w)] is accomplished Z
by maximizing u(x) defined by 1
p(x) = exp(κ u(x)), Z := exp(u(κ x))dx (3)
Z
u(x) := E[π(x)] − (γ/2)V[π(x)] (2)
In realistic models, Z is always finite.2 Optimizing utility
Investors differ in the utility they associate to final is then equivalent to predicting the most likely action of
wealth, and the CARA form is not universally appro- a randomly-acting agent whose actions are probability-
priate. Nonetheless, Levy and Markowitz (1979) and weighted by (3). The constant κ is analogous to the in-
Kroll, Levy, and Markowitz (1984) demonstrate that, in verse temperature in statistical physics.
practice, mean-variance portfolios are good approxima- This transformation shifts the problem from utility-
tions for those optimizing expected values of several other maximization to one of understanding a particular
commonly-used forms of utility. Thus optimization of stochastic process: p(x), the process model of the ran-
u(x) given by (2) is of particular interest, but our method dom trader. We will see in due course that this proba-
is general enough to include utility functions which can- bilistic view has a number of advantages. The context of
not easily be derived from mean-variance analysis. a probability model suggests applying Monte Carlo meth-
We will often refer to a planned portfolio sequence ods and, in particular, methods for estimating the mode
x = (x1 , x2 , . . . , xT ) simply as a “path.” Similarly we of a distribution. It invites us to investigate the meaning
sometimes refer to (2) as the “utility of the path x,” of probabilistic notions, such as the Markov property, in
while remembering the more complex link to utility the- the context of trading. It clarifies the connection between
ory noted above. Our task, in this simpler language, is to optimal trading and well-known random process models,
find the maximum-utility path x∗ = argmaxx u(x). such as the Linear-Gaussian state space model. The ran-
The rest of this section is devoted to mapping this dom process model also adapts easily to extensions of the
multiperiod optimization problem onto a statistical esti- above, such as constrained problems or problems where
mation problem. Specifically, our optimization problem is the space of possible trades is finite. We will return to
dual, or mathematically equivalent, to the problem of es- these points once we have further explored the structure
timating a time series of unobservable (or “hidden”) states of p(x).
which track a given observable sequence. This probabilis- Intuition 2. At time t, the decision of what to do
tic view leads to new intuition and new computational next should depend on the history of past positions
algorithms. We focus on intuition first, and discuss the (x0 , x1 , . . . , xt ) only through the current position xt , and
algorithms thereafter. possibly through additional Markov state variables as dis-
Intuition 1. Consider three hypothetical traders: (1) the cussed below. The random trader’s process model should
ideal trader, who lives in a world without transaction therefore have the Markov property.
costs and does what would be optimal in that world, (2) Intuition 2 is perhaps the most controversial of the
the optimal trader (or utility-maximizer), who lives in several we shall explore, and it does limit the class of
the real world and trades to maximize u(x), and (3) the models we will consider. It says that if our current posi-
random trader who selects trading paths with probability tion in a stock is 100 shares, our decision to buy or sell
given by an increasing function of their utility. The ran- should not depend on whether we originally acquired the
dom trader’s most likely course of action is to match the 100 shares as 50 shares on each of two separate days, or
optimal trader. 10 shares on each of 10 days, or any other sequence of
The “ideal” trader still does not know the future, trades ending in 100 shares. Most kinds of forecasts of a
but is “ideal” in the sense of operating in an idealized company’s economic future which drive everyday trading
transaction-cost-free, infinitely-liquid world. Indeed, the decisions do not violate the Markov property. For exam-
world of the ideal trader is a very useful theoretical device. ple, forecasts based on analysis of fundamentals, news,
2 Mathematically speaking, this is so because u(x) is dominated at large x by either cost or variance, which are negative terms, and
moreover there are practical limits on trade size.

Original version: July 27, 2014. 2 This version: September 22, 2014
1 INTUITION AND A PROBABILISTIC VIEW

market data, macroeconomic conditions, the company’s variance problem with a Bayesian posterior distribution
relationship to the broader market, and many others are for the expected returns (since the posterior is Gaussian,
compatible with the Markov property. Externalities can one simply replaces αt and Σt with the appropriate quan-
introduce dependence on past trades, but our present goal tities).
is to determine the actions of a utility-maximizer who is
Intuition 4. The process model of the random trader is a
not subject to such externalities.
hidden Markov model (HMM). The optimal trading path
The need for “additional Markov state variables” to
is the most likely sequence of hidden states, conditional
satisfy Intuition 2 arises primarily from the persistence of
on the ideal path y.
market impact. Market impact arises through the inter-
action of traders and market makers as studied by Kyle A Hidden Markov Model is based on a pair of cou-
(1985) and extended by many authors. The related sub- pled stochastic processes (Xt , Yt ) in which Xt is Markov
field of financial economics, known as market microstruc- and is never observed directly. Information about Xt can
ture theory, predicts that the price impact of executing a only be inferred by means of Yt which is observable and
reasonably large order has two components: permanent “contemporaneously coupled,” meaning that Yt is cou-
and temporary. pled to Xt , but not to Xs for s 6= t. This coupling has
Permanent impact corresponds to the case in which a stochastic component, and the conditional probability
the market’s actual consensus view of the value of the p(Yt | Xt ) is known to us, along with the transition proba-
security changes as a result of the trade. There is no spe- bility p(Xt | Xt−1 ) of the hidden process. These two types
cific timeframe for this effect to reverse itself. By contrast, of terms will turn out to be exactly what we need to model
temporary impact is purely liquidity-driven and hence it the multiperiod portfolio problem.
eventually decays away, although the decay can occur over In a given optimization problem, trading paths x of
multiple periods. Generally, persistent impact refers to the random trader will be modeled as realizations of the
all market impact that has not decayed away within one Markov process Xt , and the ideal sequence y = (yt ) will
period. correspond to a realization of the observable Yt . The ran-
We now construct a memoryless model accounting for dom trader’s process model, called simply p(x) above,
both forms of persistent impact. Let dt denote a vector will correspond to the density of x conditional on the
which contains, for each stock, the net effect of (decay- ideal sequence y, and is actually p(x | y). As foreshad-
ing) temporary impact from all past trades as of time t. owed by Intuition 1 and 3, the most likely realization of
Let pt denote the analogous vector of permanent impact. Xt conditional on y is the optimal sequence.
Then the augmented state variable (xt , dt , pt ) is the state The Markov property and the assumption that Yt has
vector for a memoryless process. Henceforth we assume only contemporaneous coupling to Xt together imply
that dt and pt are kept along with the state variable, but Y
suppressed in the notation. p(x | y) = p(yt | xt )p(xt | xt−1 ) (4)
Intuition 3. Let y = (y1 , y2 , . . . , yT ) be the portfolio se- t
quence of the ideal trader, defined as the optimal sequence
Any factorization of a joint density can be represented
in a world free of trading costs. The optimal sequence in
graphically in a way that highlights conditional depen-
the real world is obtained by tracking y (ie. minimizing
dence relations. From each variable, one draws arrows to
some notion of tracking error relative to y) in a cost-
any other variables which are conditioned on that variable
efficient manner.
in the given factorization. The graph for (4) is
To further clarify Intuition 3, we explain how it per-
tains to three important kinds of dynamic portfolio strate- yt yt+1
gies: liquidation, derivative hedging, and alpha trading.
x x
p(yt | xt )
 p(y | x )
 t+1 t+1 (5)
Of the three, liquidation is the simplest. Optimal liqui-
dation amounts to optimally tracking a portfolio with no . . . −−−−→ xt −−−−−−−→ xt+1 −−−−→ . . .
p(xt+1 | xt )
positions, i.e. yt = 0 for all t. For hedging exposure to
derivatives, yt should be our expectation of the offsetting Such graphical models are often referred to as Bayesian
replicating portfolio at all future times until expiration. networks. Taking logs, (4) becomes
For agents with alpha forecasts and mean-variance prefer-
ences, the ideal sequence yt = (γΣt )−1 E[rt+1 ] is familiar log p(x | y) =
X
[log p(yt | xt ) + log p(xt | xt−1 )] (6)
from classical mean-variance analysis, since in the absence t
of trading costs, the problem reduces to a sequence of de-
coupled one-period optimizations. Tracking the portfo- Logical reasoning about the structure of the terms in
lios of Black and Litterman (1992) is also a special case (6) reveals the economic aspects of the utility function
of our framework in which yt is the solution to a mean- to which they must correspond. The term log p(xt | xt−1 )

Original version: July 27, 2014. 3 This version: September 22, 2014
2 REDUCTION OF THE MULTI-ASSET CASE TO THE SINGLE-ASSET CASE

is the only term which couples xt with its predecessor dance with Intuition 3. We conclude this section by show-
xt−1 , so this term must account for all trading frictions. ing how our framework elegantly handles two important
In other words, up to the normalization constant which extensions: statistical uncertainty in parameter estima-
makes p(xt | xt−1 ) a density, tion, and portfolio constraints.
In practice, the parameters that go into return fore-
− log p(xt | xt−1 ) = ct (xt−1 , xt ) (7) casts, αt , and risk forecasts, Σt , are subject to estimation
error, like all statistical estimators. Out-of-sample vari-
Similarly, log p(yt | xt ) is the only term which couples ance depends on the precision of parameter estimates.
yt and xt , and so it must model the utility from “close- Fortunately, this type of variance is easily handled by
ness” or “proximity” to y. Since this term only concerns standard Bayesian methods. One must compute (2) with
a single moment in time, it could not possibly model any- respect to a different probability measure, i.e.
thing related to portfolio transitions. Defining b(xt , yt ) as
the total dis-utility related to not tracking yt exactly, we Epb[π(x)] − (γ/2)Vpb[π(x)],
are led to
− log p(yt | xt ) = b(xt , yt ) (8) where the mean and the variance must use the posterior
predictive density pb for returns. Letting θt denote the full
Having derived a general duality between portfolio- collection of all parameters in our model for rt , and letting
tracking problems and Hidden Markov Models, we return pt (θt ) denote the posterior density of θt in our Bayesian
to the important special case of mean-variance prefer- model after all data has been assimilated, the predictive
ences, in which u(x) := E[π(x)] − (γ/2)V[π(x)]. Here, density for rt is
E and V denote mean and variance as forecast at time Z
t = 0. It is of interest to determine yt and the function pbt (rt ) := pt (rt | θt )pt (θt ) dθt ,
b(x, y). Defining αt := E[rt+1 ] and Σt = V[rt+1 ],
Xh γ > i
and the mean-variance investor must calculate Epb and Vpb
u(x) = x> t αt − xt Σt xt − ct (xt−1 , xt ) (9)
2 using rt ∼ pbt (rt ).
t
A strength of the probabilistic framework is its elegant
Neglecting terms that do not depend on x, the first two conceptual handling of constraints.
terms of (9) are equivalent to
Intuition 5. Constraints are regions of path space with
1 zero probability.
bγΣt (xt , yt ) := (yt − xt )> γΣt (yt − xt ) (10)
2 For example, a long-only constraint simply means that
−1 p(x) = 0 if the path x contains short positions. Practi-
where yt = (γΣt ) αt . The latter is a classic mean-
cally, this means that sampling from p will never generate
variance portfolio, which is well-known to be the solution
sample paths which are infeasible with respect to the con-
to a myopic problem without costs or constraints, and
straints, and that the global maximum of p is always a
bγΣt measures variance of the tracking error. Then
feasible path, if one exists.
X
u(x) = − [bγΣt (xt , yt ) + ct (xt−1 , xt )] (11)
t 2 Reduction of the Multi-Asset Case to the Single-
Asset Case
Eqns. (7) and (8) specify how to map the two terms in
(11) onto p(yt | xt ) and p(xt | xt−1 ) in the associated Hid- We consider the general case of N assets, N > 1, and show
den Markov Model. The observation channel p(yt | xt ) is that this problem can be reduced to iteratively finding
the Gaussian density whose negative log is (10), and the optimal single-asset paths. We then solve the single-asset
transition density is related to transaction cost by (7). If case in the next section.
ct (xt−1 , xt ) is a quadratic function of ∆xt = xt − xt−1
We will make the fairly weak assumption that our no-
as in Garleanu and Pedersen (2009), then p(xt | xt−1 ) is
tion of distance b(xt , yt ) from the ideal sequence yt is a
Gaussian as well. When both p(yt | xt ) and p(xt | xt−1 )
function that is convex and differentiable. These condi-
are Gaussian, the total utility (11) is quadratic and the
tions are satisfied by the positive-definite quadratic form
associated HMM is a linear-Gaussian state space model,
solved explicitly by the Kalman smoothing recursions. (yt − xt )> (γΣt )(yt − xt ) = bγΣt (xt , yt ) (12)
For details of the Kalman smoother, we refer to Durbin
and Koopman (2012). considered above. Importantly, this still allows for non-
In summary, mean-variance-cost optimization reduces differentiable trading costs.
to tracking the ideal sequence yt = (γΣt )−1 αt , in accor- We model trading cost as separable in the sense that

Original version: July 27, 2014. 4 This version: September 22, 2014
3 FINDING OPTIMAL PATHS: ONE ASSET, MULTIPLE PERIODS

it is additive over assets, fails to do so for almost any starting point. The key as-
X sumption that “the non-differentiable part is separable”
ct (xt−1 , xt ) = cit (xit−1 , xit ) (13) as in (14) is really necessary for BCD to work.
i
Intuition 6. The optimal multi-asset trading path x can be
where the superscript i always refers to the i-th asset. For found by treating each asset in turn, keeping positions in
some kinds of costs, such as commissions or borrow costs, the others held fixed, and cycling through assets until con-
separation (13) is true by construction. For impact, this vergence. Each single-asset optimal path is immediately
says that to estimate the market impact we would have if incorporated into x before proceeding to the next asset.
we were to trade 1% of the average daily volume (ADV) If b(y−x) is a quadratic function, such as (12) summed
in AAPL, we do not need to know our position in IBM over t, then it projects to a lower-dimensional quadratic
or the trades we intend to do in IBM. when xj (j 6= i) are held fixed and xi alone is allowed
Hence the non-differentiable (and generally more com- to vary. In this case, each iteration calls for minimizing
plicated) term in u(x) is separable across assets. If the q(xi ) + ci (xi ) where q(xi ) is quadratic. This subprob-
differentiable term (12) were also separable, we could op- lem is, mathematically, a single-asset problem, and yet
timize each asset’s trading path independently without the coefficients of the quadratic function q(xi ) depend on
considering the others, but we can’t: the differentiable the rest of the portfolio. This is as it should be. Intu-
term is usually not separable. Intuitively, trading in any itively, increasing holdings of the i-th asset could increase
one asset could either increase or decrease the tracking the portfolio risk, or it could actually reduce the portfo-
error variance, depending on the positions in the other lio risk if the i-th asset is a hedge. One needs to at least
assets. know the risk exposures of the rest of the portfolio when
Since x = (x1 , . . . , xT ) denotes a trading path for all performing optimization for the i-th asset’s trading path.
assets, let xi = (xi1 , . . . , xiT ) denote the projection of this In summary, we have shown how to reduce the multi-
path onto the i-th asset. Let ci (xi ) denote the total cost asset multiperiod problem to iterations of the single-asset
of the i-th asset’s trading path. We require that each ci multiperiod problem, with convergence assured by the
be a convex function on the T -dimensional space of trad- theorem of Tseng (2001).
ing paths for the i-th asset.3 Putting this all together, we
want to minimize f (x) = −u(x) where
X 3 Finding Optimal Paths: One Asset, Multiple Peri-
f (x) = b(y − x) + ci (xi ) (14) ods
i
b : convex, continuously differentiable Now let us consider the multiperiod problem for a single
asset. In this case, the ideal sequence y = (yt ) and the
ci : convex, non-differentiable optimal holdings (or equivalently, hidden states) x = (xt )
are both univariate time series. We address the problem
Consider the following blockwise coordinate descent
of optimizing utility of the path, represented by (6), in
(BCD) algorithm. Choose an initial guess for x, and set
this important special case. Since the multiperiod many-
i = 1. Iterate the following until convergence:
asset problem can be reduced to iteratively solving a se-
1. Optimize f (x) over xi , holding xj fixed for all j 6= i. quence of single-asset problems, the methods we develop
Denote this optimum by x̂i . in this section are important even if our main interest is
in multi-asset portfolios.
2. Update x by setting the coordinates relevant to the Certain special cases lend themselves to treatment by
i-th asset, xi , equal to x̂i . fast special-purpose optimizers. For example, if all of the
3. If i = N , set i = 1; otherwise set i = i + 1. terms in (6) happen to be quadratic (i.e. logs of Gaus-
sians) and there are no constraints, then the associated
Seminal work of Tseng (2001) shows that for f (x) of HMM is a linear-Gaussian state space model and the ap-
the form (14), under fairly mild continuity assumptions, propriate tool is the Kalman smoother. If the state space
any limit point of the BCD iteration is a minimizer of is continuous, and if the objective function and all con-
f (x). See also Tseng and Yun (2009) and Tseng (1988). straints are convex and differentiable, then modern con-
Note that for a generic non-differentiable convex func- vex solvers (Boyd and Vandenberghe, 2004) apply.
tion, there is no reason to expect BCD to find the global A very important class of examples arises when there
minimum, and it’s trivial to construct examples where it are no constraints, but the cost function is a convex and
3 This is true for a wide variety of cost functions that have been considered. For example, the model of Kyle and Obizhaeva (2011) has

this property, as does borrow cost, market impact as in Almgren et al. (2005), piecewise-linear functions, and sums of these and other
convex functions.

Original version: July 27, 2014. 5 This version: September 22, 2014
3 FINDING OPTIMAL PATHS: ONE ASSET, MULTIPLE PERIODS

non-differentiable function of the difference or “trade” but so large that it is well approximated by a continuous
one. Nonetheless, a finite state space could be a useful
δt := xt − xt−1 . (15) tool. If the state space were finite, we could follow stan-
dard practice for finding the most likely state sequence
This allows for non-quadratic terms as in Almgren et in a finite HMM, which is to use the ingenious algorithm
al. (2005) and non-differentiable terms such as Kyle and due to Viterbi (1967).
Obizhaeva (2011)’s spread term. In this case, it turns
out we can use Tseng’s theorem again, applied to trades Viterbi’s algorithm is general enough to allow the set
δt rather than positions xt . Write of available states to change through time. Let St denote
the (finite) state space at time t. First, run through time
Xt in the forward direction, calculating for each time t and
xt = x0 + δs . every state xt ∈ St , the probability vt (xt ) of the most-
s=1 probable state sequence ending in state xt after t steps.
Calculation of vt (xt ) is done recursively, noting that any
The single-asset utility function is then
sequence ending in state xt can be broken up into a subse-
quence of t−1 steps (ending, say, at xt−1 ) plus a transition
" t
#
X  X 
u(x) = − b x0 + δs , yt + ct (δt ) (16) from xt−1 to xt . By the optimality principle of Bellman
t s=1 (1957), the subsequence contributing to vt (xt ) must have
been the most probable sequence ending at xt−1 in t−1
The single-asset utility function (16) satisfies the con- steps, so its probability is v (x ). Hence, for every
t−1 t−1
vergence criteria of Tseng (2001), ie. that the non- x ∈ S , compute
t t
differentiable term is separable across time as a func-
tion of {δt }, while the non-separable term is differen-
 
vt (xt ) = max p(xt | xt−1 , yt )vt−1 (xt−1 ) (17)
xt−1
tiable. One may thus adapt the coordinate descent al-
gorithm described above as follows. Choose an initial and save the state which achieved the maximum for
guess δ = {δ1 , . . . , δT }, and for each t = 1, . . . , T , opti- later use. The endpoint of the optimal sequence is then
mize (16) over δt while holding {δs : s 6= t} fixed, leading x∗ = argmax v (x ). Finally, backtrack from x∗ using
T xT T T T
to the optimal trade δ̂t . Set the t-th coordinate of δ equal the states saved in the previous step to recursively find
to δ̂t and increment t, returning cyclically to t = 1 once the full optimal sequence.
t > T . As long as ct (δt ) is a convex function of δt and b is
differentiable and convex, this algorithm converges to the Eqn. (17) is essentially the Bellman equation. For nu-
single-asset optimal path. Note that each step requires merical stability one typically works with log-probability.
only a univariate optimization. A resonable initial guess Taking logs transforms (17) to an additive form in which
for δ may be obtained from a Kalman smoother solution log v t (x t ) is Bellman’s value function.
obtained using a quadratic approximation to cost. If K = maxt |St | is the maximal number of states, the
Hence in this important and large class of examples, time and space requirements of the Viterbi algorithm are
multiperiod optimization for a single asset can be reduced both O(K 2 T ), which means that we need to control K by
to a sequence of one-dimensional optimization problems. working in a judiciously-chosen smaller state space, and
The coordinate descent algorithm we propose here is also yet we must ensure that a good approximation to the op-
widely used in statistics, where it is among the fastest- timal path can still be found in this smaller state space.
known algorithms for solving lasso and elastic-net regres- This is precisely what sampling from p(x) accomplishes,
sion problems; see Friedman et al. (2007) and Friedman, because it typically generates paths in the region of path
Hastie, and Tibshirani (2010). space near the mode, where most of the probability mass
We now present a general-purpose method which is is located, and we are free to tune the arbitrary constant
slower than the method just described, because it is a κ in (3) to achieve reasonable coverage of the relevant re-
Monte Carlo statistical method, but which works for ab- gion of path space. The union of all points comprising all
solutely any cost function (irrespective of differentiability, of the paths sampled from p(x) is the smaller state space
convexity, or other concerns), and any constraints which we need.
can be expressed as single-asset constraints. It handles In fact, sampling from p(x) is much easier than sam-
cases where a discrete solution is actually preferred over pling from a generic N T -dimensional density because the
a continuous one, such as when trading is desired to be structure of (4) allows the use of sequential monte carlo
in round lots. This method is based on the HMM repre- (SMC) methods. The nonlinear filtering technique based
sentation (4), (5), and (6). on SMC is known as the particle filter ; for details see
Stocks and most other assets trade in integer multi- Doucet and Johansen (2009, and references therein).
ples of a fundamental unit, so the state space is finite,

Original version: July 27, 2014. 6 This version: September 22, 2014
REFERENCES

Intuition 7. If we draw sufficiently many sample paths


from the density p(x), then the union of the points in all
of those paths is a discretization of the region of path space
near the optimal path. Applying the Viterbi algorithm to
this “smaller state space” gives a good approximation of
the optimal path, which becomes a better approximation
as more sample paths are added.

Godsill, Doucet, and West (2001) proved that the al-


gorithm suggested by Intuition 7 converges to the most
likely hidden state sequence, ie. the mode of p(x | y). This
algorithm works in part because the Viterbi algorithm has
full freedom to choose any path through the set of points
formed as the union of the Monte Carlo samples.
Figure 1: The solid line (“Kalman”) is the solution to a
As a proof of concept, we study optimal trading
quadratic-cost problem which has the highest true-utility
with a stylized alpha term structure and with the non-
among all solutions to all quadratic-cost problems with
differentiable cost function
the same yt , γΣt . The dashed line (“Viterbi”) is the path
c (δ ) = λ |δ | + λ δ 2
(18) which optimizes true-utility over all possible paths.
t t 1 t 2 t

derived by Kyle and Obizhaeva (2011) from market mi-


4 Conclusions
crostructure invariance. The coefficients λ1 , λ2 are func-
tions of volatility and volume. We have presented a new theoretical framework for multi-
It is tempting to wonder whether a purely-quadratic period optimization with transaction costs which recasts
approximation to cost (ie. λ1 = 0 and some other value the problem as estimation of a hidden state sequence in
of λ2 ) might suffice, since such a problem could be easily a Markov chain. This framework is general enough to
solved by the Kalman smoother. One should resist this encompass the vast majority of the multiperiod portfolio
temptation, however (except for the purpose of generat- choice and portfolio tracking problems that have thus far
ing a good starting point for BCD iteration). Consider appeared in the literature. Constraints are incorporated
the class of problems with the same yt , γ, and Σt but with gracefully with no change to the fundamental theory. The
purely-quadratic costs. Often, there is no solution to any framework leads naturally to practical optimization meth-
problem in this class whose true utility, as computed with ods which are shown to converge for a large class of cost
the true cost function, comes close to the global maximum functions.
of true utility. In other words, no quadratic approxima-
tion is good enough, and more generally, getting the math-
References
ematical form of the cost function right and using it in
optimization is of great importance.
Almgren, Robert and Neil Chriss (1999). “Value under
Our example term structure is comprised of two liquidation”. In: Risk 12.12, pp. 61–63.
exponentially-decaying alpha models. Model 1 is initially Almgren, Robert et al. (2005). “Direct estimation of eq-
25bps, with half-life 4 periods, while Model 2 is initially uity market impact”. In: Risk 57.
-40bps with a shorter half-life of 2 periods. Adding these Bellman, R. (1957). Dynamic Programming. Princeton
two models produces a term structure that is negative, University Press, Princeton, NJ.
then positive, then decays to almost zero within about 20 Black, Fischer and Robert Litterman (1992). “Global
periods. portfolio optimization”. In: Financial Analysts Jour-
Figure 1 shows that the best possible Kalman path nal, pp. 28–43.
(solid line) places a larger number of trades than the Boyd, Stephen P and Lieven Vandenberghe (2004). Con-
Viterbi path, but the individual trades are smaller. This vex optimization. Cambridge university press.
is because the purely-quadratic cost function is over- Doucet, Arnaud and Adam M Johansen (2009). “A tu-
estimating the true cost, estimated by (18), of large trades torial on particle filtering and smoothing: Fifteen
and under-estimating the true cost of small trades. The years later”. In: Handbook of Nonlinear Filtering 12,
absolute-value term in (18) allows sparse solutions, as is pp. 656–704.
familiar from elastic-net regression. The particle filter Durbin, James and Siem Jan Koopman (2012). Time se-
and subsequent Viterbi estimation ran in a few seconds ries analysis by state space methods. 38. Oxford Uni-
on a notebook computer. versity Press.

Original version: July 27, 2014. 7 This version: September 22, 2014
REFERENCES REFERENCES

Friedman, Jerome, Trevor Hastie, and Rob Tibshirani Levy, Haim and Harry M Markowitz (1979). “Approx-
(2010). “Regularization paths for generalized linear imating expected utility by a function of mean and
models via coordinate descent”. In: Journal of statis- variance”. In: The American Economic Review 69.3,
tical software 33.1, p. 1. pp. 308–317.
Friedman, Jerome et al. (2007). “Pathwise coordinate op- Merton, Robert C (1969). “Lifetime portfolio selection un-
timization”. In: The Annals of Applied Statistics 1.2, der uncertainty: The continuous-time case”. In: Review
pp. 302–332. of Economics and statistics 51.3, pp. 247–257.
Garleanu, Nicolae B and Lasse H Pedersen (2009). Dy- Samuelson, Paul A (1969). “Lifetime Portfolio Selection
namic trading with predictable returns and transac- By Dynamic Stochastic Programming”. In: The Re-
tion costs. Tech. rep. National Bureau of Economic view of Economics and Statistics 51.3, pp. 239–246.
Research. Tseng, Paul (1988). Coordinate ascent for maximizing
Godsill, Simon, Arnaud Doucet, and Mike West (2001). nondifferentiable concave functions. Technical Report
“Maximum a posteriori sequence estimation using LIDS-P 1840. Massachusetts Institute of Technology,
Monte Carlo particle filters”. In: Annals of the Insti- Laboratory for Information and Decision Systems.
tute of Statistical Mathematics 53.1, pp. 82–96. — (2001). “Convergence of a block coordinate descent
Kolm, Petter and Gordon Ritter (2014). “Bayesian Dy- method for nondifferentiable minimization”. In: Jour-
namic Models and General Multi-period Portfolio nal of optimization theory and applications 109.3,
Choice”. In: in preparation. pp. 475–494.
Kroll, Yoram, Haim Levy, and Harry M Markowitz Tseng, Paul and Sangwoon Yun (2009). “A coordinate
(1984). “Mean-Variance versus Direct Utility Maxi- gradient descent method for nonsmooth separable
mization”. In: The Journal of Finance 39.1, pp. 47– minimization”. In: Mathematical Programming 117.1-
61. 2, pp. 387–423.
Kyle, Albert Pete and Anna Obizhaeva (2011). “Market Viterbi, Andrew (1967). “Error bounds for convolutional
Microstructure Invariants: Theory and Implications of codes and an asymptotically optimum decoding algo-
Calibration”. In: Available at SSRN 1978932. rithm”. In: Information Theory, IEEE Transactions
Kyle, Albert S (1985). “Continuous auctions and insider on 13.2, pp. 260–269.
trading”. In: Econometrica: Journal of the Economet-
ric Society, pp. 1315–1335.

Original version: July 27, 2014. 8 This version: September 22, 2014

You might also like