You are on page 1of 462

QUANTITATIVE ECONOMICS

Thomas Sargent and John Stachurski February 05, 2014

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

CONTENTS

Introduction 1.1 Overview . . . . . . . . 1.2 What You Will Learn . 1.3 PDF or HTML? . . . . 1.4 Structure of the Course

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

9 . 9 . 9 . 10 . 10 . . . . . . . . . . . . . . . . . . . . . . 11 11 22 30 44 56 65 81 95 95 108 116 125 138 153 153 170 184 187 192 205 226 238 253 277

Programming in Python 2.1 About Python . . . . . . . . . . . . . . . . . 2.2 Setting up Your Python Environment . . . . 2.3 An Introductory Example . . . . . . . . . . 2.4 Python Essentials . . . . . . . . . . . . . . . 2.5 Object Oriented Programming . . . . . . . . 2.6 How it Works: Data, Variables and Names . 2.7 More Language Features . . . . . . . . . . . The Scientic Libraries 3.1 NumPy . . . . . . . . . . . . 3.2 SciPy . . . . . . . . . . . . . 3.3 Matplotlib . . . . . . . . . . 3.4 Pandas . . . . . . . . . . . . 3.5 IPython Shell and Notebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introductory Applications 4.1 Linear Algebra . . . . . . . . . . . . . . . 4.2 Finite Markov Chains . . . . . . . . . . . 4.3 Shortest Paths . . . . . . . . . . . . . . . 4.4 Schellings Segregation Model . . . . . . 4.5 LLN and CLT . . . . . . . . . . . . . . . 4.6 Linear State Space Models . . . . . . . . 4.7 A First Look at the Kalman Filter . . . . 4.8 Innite Horizon Dynamic Programming 4.9 LQ Control Problems . . . . . . . . . . . 4.10 Rational Expectations Equilibrium . . .

CONTENTS

4 285 285 299 307 314 322 333 356 371 383 401 401 401 401 401 402 402 403 403 403 404 404 404 405 406 406 406 407 407 407 407 408 408 408 410 410 410 411 411 412 412 413 415 415 416 418 419

Advanced Applications 5.1 Continuous State Markov Chains . . . . . 5.2 Modeling Career Choice . . . . . . . . . . 5.3 On-the-Job Search . . . . . . . . . . . . . . 5.4 Search with Offer Distribution Unknown 5.5 Optimal Savings . . . . . . . . . . . . . . . 5.6 Robustness . . . . . . . . . . . . . . . . . . 5.7 Linear Stochastic Models . . . . . . . . . . 5.8 Estimation of Spectra . . . . . . . . . . . . 5.9 Optimal Taxation . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Solutions to Exercises 6.1 Exercises from An Introductory Example . . . Solution to Exercise 1 . . . . . . . . . Solution to Exercise 2 . . . . . . . . . Solution to Exercise 3 . . . . . . . . . Solution to Exercise 4 . . . . . . . . . Solution to Exercise 5 . . . . . . . . . Solution to Exercise 6 . . . . . . . . . 6.2 Exercises from Python Essentials . . . . . . . Solution to Exercise 1 . . . . . . . . . Solution to Exercise 2 . . . . . . . . . Solution to Exercise 3 . . . . . . . . . Solution to Exercise 4 . . . . . . . . . Solution to Exercise 5 . . . . . . . . . 6.3 Exercises from Object Oriented Programming Solution to Exercise 1 . . . . . . . . . Solution to Exercise 2 . . . . . . . . . 6.4 Exercises from More Language Features . . . Solution to Exercise 1 . . . . . . . . . Solution to Exercise 2 . . . . . . . . . Solution to Exercise 3 . . . . . . . . . 6.5 Exercises from NumPy . . . . . . . . . . . . Solution to Exercise 1 . . . . . . . . . Solution to Exercise 2 . . . . . . . . . Solution to Exercise 3 . . . . . . . . . 6.6 Exercises from SciPy . . . . . . . . . . . . . Solution to Exercise 1 . . . . . . . . . 6.7 Exercises from Pandas . . . . . . . . . . . . . Solution to Exercise 1 . . . . . . . . . 6.8 Exercises from LLN and CLT . . . . . . . . . Solution to Exercise 1 . . . . . . . . . Solution to Exercise 2 . . . . . . . . . 6.9 Exercises from Finite Markov Chains . . . . . Solution to Exercise 1 . . . . . . . . . Solution to Exercise 2 . . . . . . . . . Solution to Exercise 3 . . . . . . . . . 6.10 Exercises from Schellings Segregation Model

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

CONTENTS

5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 421 421 421 422 423 424 424 425 425 426 427 427 428 428 429 430 430 431 433 435 435 437 438 439 440 440 441 441 442 443 444 444 445 447 447 448 448 448 449 450 450 451 451 452 453 453

6.11

6.12

6.13 6.14

6.15

6.16

6.17 6.18

6.19

6.20

6.21

6.22

6.23

Solution to Exercise 1 . . . . . . . . . . . . . . . Exercises from Linear State Space Models . . . . . . . . Solution to Exercise 1 . . . . . . . . . . . . . . . Solution to Exercise 2 . . . . . . . . . . . . . . . Solution to Exercise 3 . . . . . . . . . . . . . . . Solution to Exercise 4 . . . . . . . . . . . . . . . Exercises from A First Look at the Kalman Filter . . . . . Solution to Exercise 1 . . . . . . . . . . . . . . . Solution to Exercise 2 . . . . . . . . . . . . . . . Solution to Exercise 3 . . . . . . . . . . . . . . . Solution to Exercise 4 . . . . . . . . . . . . . . . Exercises from Shortest Paths . . . . . . . . . . . . . . . Solution to Exercise 1 . . . . . . . . . . . . . . . Exercises from Innite Horizon Dynamic Programming . Solution to Exercise 1 . . . . . . . . . . . . . . . Solution to Exercise 2 . . . . . . . . . . . . . . . Exercises from LQ Control Problems . . . . . . . . . . . Solution to Exercise 1 . . . . . . . . . . . . . . . Solution to Exercise 2 . . . . . . . . . . . . . . . Solution to Exercise 3 . . . . . . . . . . . . . . . Exercises from Rational Expectations Equilibrium . . . . Solution to Exercise 1 . . . . . . . . . . . . . . . Solution to Exercise 2 . . . . . . . . . . . . . . . Solution to Exercise 3 . . . . . . . . . . . . . . . Solution to Exercise 4 . . . . . . . . . . . . . . . Exercises from Search with Offer Distribution Unknown Solution to Exercise 1 . . . . . . . . . . . . . . . Exercises from Modeling Career Choice . . . . . . . . . . Solution to Exercise 1 . . . . . . . . . . . . . . . Solution to Exercise 2 . . . . . . . . . . . . . . . Solution to Exercise 3 . . . . . . . . . . . . . . . Exercises from On-the-Job Search . . . . . . . . . . . . . Solution to Exercise 1 . . . . . . . . . . . . . . . Solution to Exercise 2 . . . . . . . . . . . . . . . Exercises from Estimation of Spectra . . . . . . . . . . . Solution to Exercise 1 . . . . . . . . . . . . . . . Solution to Exercise 2 . . . . . . . . . . . . . . . Exercises from Continuous State Markov Chains . . . . Solution to Exercise 1 . . . . . . . . . . . . . . . Solution to Exercise 2 . . . . . . . . . . . . . . . Exercises from Optimal Savings . . . . . . . . . . . . . Solution to Exercise 1 . . . . . . . . . . . . . . . Solution to Exercise 2 . . . . . . . . . . . . . . . Solution to Exercise 3 . . . . . . . . . . . . . . . Solution to Exercise 4 . . . . . . . . . . . . . . . Exercises from Optimal Taxation . . . . . . . . . . . . . Solution to Exercise 1 . . . . . . . . . . . . . . .

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

FAQs / Useful Resources 7.1 FAQs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 How do I install Python? . . . . . . . . . . . . . . . . . . . . 7.3 How do I start Python? . . . . . . . . . . . . . . . . . . . . . 7.4 How can I get help on a Python command? . . . . . . . . . 7.5 Where do I get all the Python programs from the lectures? 7.6 Whats Git? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Other Resources . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 IPython Magics . . . . . . . . . . . . . . . . . . . . . . . . . 7.9 IPython Cell Magics . . . . . . . . . . . . . . . . . . . . . . 7.10 Useful Links . . . . . . . . . . . . . . . . . . . . . . . . . . . References

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

455 455 455 455 455 455 456 456 456 456 456 457

CONTENTS

Note: You are currently viewing an automatically generated PDF version of our online lectures, which are located at

http://quant-econ.net
This PDF is generated from a set of source les that are orientated towards the website and to HTML output. At this stage the presentation quality is a bit less consistent than the website, and some internal links might not work.

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

CONTENTS

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

CHAPTER

ONE

INTRODUCTION
Science is what we understand well enough to explain to a computer; art is everything else. Donald E. Knuth

1.1 Overview
This website contains a sequence of lectures on economic modeling, focusing on the use of programming and computers for both problem solving and building intuition The primary programming language used in the lecture series is Python, a general purpose, open source programming language with excellent scientic libraries (Well tell you more about Python and why we chose it in the next lecture) At this stage, the level of the lectures varies from advanced undergraduate to graduate, although we intend to add more elementary applications in the near future The lectures are suitable for courses in quantitative methods and computational techniques, and also for self study and independent study groups To aid self study, all exercises have solutions Our solutions are not the last word on each exercise instead they provide one approach that demonstrates good coding practices

1.2 What You Will Learn


If you work through the majority of the course and do the exercises, you will learn how to analyze a number of fundamental economic problems, from job search and neighborhood selection to optimal scal policy the core of the Python programming language, including the main scientic libraries good programming style how to work with modern software development tools such as debuggers and version control

1.3. PDF OR HTML?

10

a number of mathematical topics central to economic modeling, such as dynamic programming nite and continuous Markov chains ltering and state space models Fourier transforms and spectral analysis etc., etc. related numerical methods function approximation numerical optimization simulation based techniques and Monte Carlo recursion etc., etc.

1.3 PDF or HTML?


You can view these lectures on-line or download the PDF version (If you are reading this on-line, see the menu bar at the top of the page to download the PDF) If you decide to use the PDF, please be aware that 1. the PDF is automatically generated from source that is orientated towards the website and to HTML output, not PDF 2. the website will be change regularly, so each PDF will soon become out of date Nonetheless, we appreciate that PDF is sometimes more convenient for reading than a live website

1.4 Structure of the Course


The rst two parts of the course deal with the core Python language and the scientic libraries The third part of the course contains easier applications In these applications, coding strategies are discussed slowly and in depth The fourth part of the course is more advanced, and the lectures can be read selectively, according to your interests

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

CHAPTER

TWO

PROGRAMMING IN PYTHON
This rst part of the course provides a relatively fast-paced introduction to the Python programming language

2.1 About Python


Overview of This Lecture
In this lecture we will Outline what Python is Showcase some of its abilities Compare it to some other languages When we show you Python code, it is not our intention that you seek to follow all the details, or try to replicate all you see We will work through all of the Python material step by step later in the lecture series Our only objective for this lecture is to give you some feel of what Python is, and what it can do

What's Python?
Python is a general purpose programming language conceived in 1989 by Dutch programmer Guido van Rossum Python is free and open source Community-based development of the core language is coordinated through the Python Software Foundation Python is supported by a vast collection of standard and external software libraries Python has experienced rapid adoption in the last decade, and is now one of the most popular programming languages The PYPL index gives some indication of how its popularity has grown

11

2.1. ABOUT PYTHON

12

Common Uses Python is a general purpose language used in almost all application domains communications web development CGI and graphical user interfaces games multimedia, data processing, security, etc., etc., etc. Used extensively by Internet service and high tech companies such as Google Dropbox Reddit YouTube Walt Disney Animation, etc., etc. Often used to teach computer science and programming Introduction to computer science at edx/MIT Computer science 101 at Udacity For reasons we will discuss, Python is particularly popular within the scientic community academia, NASA, CERN, etc. Meteorology, computational biology, chemistry, machine learning, articial intelligence, etc., etc.

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.1. ABOUT PYTHON

13

(To get an idea, you might like to browse some of the seminar topics from the most recent SciPy conference) Features A high level language suitable for rapid development Design philosophy emphasizes simplicity and readability Relatively small core language supported by many libraries A multiparadigm language, in that multiple programming styles are supported (procedural, object-oriented, functional, etc.) Interpreted rather than compiled

Scientic Programming
Over the last decade, Python has become one of the core languages of scientic computing This section briey showcases some examples of Python for scientic programming All of these topics will be covered in detail later on Click on any gure to expand it Numerical programming Fundamental matrix and array processing capabilities are provided by the excellent NumPy library NumPy provides the basic array data type plus some simple processing operations For example
In [1]: import numpy as np In [2]: a = np.linspace(-np.pi, np.pi, 100) In [3]: b = np.cos(a) In [4]: c = np.ones(25) In [5]: np.dot(c, c) Out[5]: 25.0 # Load the library # Create array (even grid from -pi to pi) # Apply cosine to each element of a # An array of 25 ones # Compute inner product

The SciPy library is built on top of NumPy and provides additional functionality For example, 2 lets calculate 2 (z)dz where is the standard normal density
In [5]: from scipy.stats import norm In [6]: from scipy.integrate import quad In [7]: phi = norm()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.1. ABOUT PYTHON

14
# Integrate using Gaussian quadrature

In [8]: value, error = quad(phi.pdf, -2, 2) In [9]: value Out[9]: 0.9544997361036417

SciPy includes many of the standard routines used in linear algebra integration interpolation optimization distributions and random number generation signal processing etc., etc. Graphics The most popular and comprehensive Python library for creating gures and graphs is Matplotlib Plots, histograms, contour images, 3D, bar charts, etc., etc. Output in many formats (PDF, PNG, EPS, etc.) LaTeX integration Example 2D plot with embedded LaTeX annotations

Example contour plot T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.1. ABOUT PYTHON

15

Example 3D plot More examples can be found in the Matplotlib thumbnail gallery Other graphics libraries include VPython 3D graphics and animations pyprocessing a Processing-like graphics environment Many more, but we will use only Matplotlib Symbolic Algebra Sometimes its useful to be able to manipulate symbolic expressions in the spirit of Mathematica / Maple The SymPy library provides this functionality from within the Python shell
In [10]: from sympy import Symbol In [11]: x, y = Symbol('x'), Symbol('y') In [12]: x + x + x + y Out[12]: 3*x + y # Treat 'x' and 'y' as algebraic symbols

We can manipulate expressions


In [13]: expression = (x + y)**2 In [14]: expression.expand() Out[14]: x**2 + 2*x*y + y**2

solve polynomials

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.1. ABOUT PYTHON

16

In [15]: from sympy import solve In [16]: solve(x**2 + x + 2) Out[16]: [-1/2 - sqrt(7)*I/2, -1/2 + sqrt(7)*I/2]

and calculate limits, derivatives and integrals


In [17]: from sympy import limit, sin, diff In [18]: limit(1 / x, x, 0) Out[18]: oo In [19]: limit(sin(x) / x, x, 0) Out[19]: 1 In [20]: diff(sin(x), x) Out[20]: cos(x)

The beauty of importing this functionality into Python is that we are working within a fully edged programming language Can easily create tables of derivatives, generate LaTeX output, add it to gures, etc., etc. Statistics Pythons data manipulation and statistics libraries have improved rapidly over the last few years Pandas One of the most popular libraries for working with data is pandas Pandas is fast, efcient, exible and well designed

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.1. ABOUT PYTHON

17

Heres a simple example


In [21]: import pandas as pd In [22]: import scipy as sp In [23]: data = sp.randn(5, 2) # Create 5x2 matrix of random numbers for toy example In [24]: dates = pd.date_range('28/12/2010', periods=5) In [25]: df = pd.DataFrame(data, columns=('price', 'weight'), index=dates) In [26]: print df price weight 2010-12-28 0.007255 1.129998 2010-12-29 -0.120587 -1.374846 2010-12-30 1.089384 0.612785 2010-12-31 0.257478 0.102297 2011-01-01 -0.350447 1.254644

In [27]: df.mean() out[27]: price 0.176616 weight 0.344975

Other Useful Statistics Libraries statsmodels various statistical routines scikit-learn machine learning in Python (sponsored by Google, among others) pyMC for Bayesian data analysis pystan Bayesian analysis based on stan Networks and Graphs Python has many libraries for studying graphs One well-known example is NetworkX Standard graph algorithms for analyzing network structure, etc. Plotting routines etc., etc. Heres some example code that generates and plots a random graph, with node color determined by shortest path length from a central node
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: nx_demo.py Authors: John Stachurski and Thomas J. Sargent LastModified: 11/08/2013 """

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.1. ABOUT PYTHON

18

import networkx as nx import matplotlib.pyplot as plt from matplotlib import cm import numpy as np G = nx.random_geometric_graph(200, 0.12) # Generate random graph pos = nx.get_node_attributes(G, 'pos') # Get positions of nodes # find node nearest the center point (0.5,0.5) dists = [(x - 0.5)**2 + (y - 0.5)**2 for x, y in pos.values()] ncenter = np.argmin(dists) # Plot graph, coloring by path length from central node p = nx.single_source_shortest_path_length(G, ncenter) plt.figure() nx.draw_networkx_edges(G, pos, alpha=0.4) nx.draw_networkx_nodes(G, pos, nodelist=p.keys(), node_size=120, alpha=0.5, node_color=p.values(), cmap=plt.cm.jet_r) plt.show()

The gure it produces looks as follows

Cloud Computing Running your Python code on massive servers in the cloud is becoming easier and easier An excellent example is Wakari well discuss how to get started with Wakari in the next lecture Another alternative is PiCloud See also Amazon Elastic Compute Cloud T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.1. ABOUT PYTHON

19

The Google App Engine (Python, Java, PHP or Go) Pythonanywhere Sagemath Cloud Parallel Processing Apart from the cloud computing options listed above, you might like to consider Parallel computing through IPython clusters The Starcluster interface to Amazons EC2 GPU programming through Copperhead or Pycuda Interfacing with C or Fortran But isnt Fortran / C faster for scientic computing (along with everything else)? In one sense the answer is yesthese languages compile into native machine code, which runs very fast However, it turns out that more lines of scientic code are written in other languages, like Python Why is this the case? The reason is that your time is a far more valuable resource than the computers time The correct objective function to minimize is total time = writing and debugging time + run time An ideal language would minimize both terms on RHS, but there is a trade-off here To minimize the rst term, optimize for humans To minimize the second term, optimize for computers Higher level languages such as Python are optimized for humans Lower level languages such as Fortran and C are optimized for computers Lower level languages run faster and give greater control, at the cost of taking longer to write and debug more details to address (declaring variables, memory allocation/deallocation, etc.) requiring boilerplate code, writing of which is error prone and very tedious For these reasons, the modern scientic paradigm is to combine the strengths of high and low level languages as follows: 1. Write a prototype program in a high-level language such as Python 2. If the program is too slow, then prole it to nd out where the bottlenecks are 3. Rewrite those and only those small parts of the code in Fortran / C 4. Rewrite the existing Python program to call this new Fortran / C code when necessary

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.1. ABOUT PYTHON

20

There are many ways to accomplish this we will cover several in a later lecture Other Developments There are many other interesting developments with scientic programming in Python Some representative examples include IPython notebook Python in your browser with code cells, embedded images, etc. Numba speed up scientic Python code Blaze a generalization of NumPy PyMC Bayesian statistical modeling in Python PyTables manage large data sets CVXPY convex optimization in Python

Why not MATLAB?


To some extent this question is inevitable, given that MATLAB is still the most common scripting language for numerical computing within economics MATLAB and Python are both high quality tools As for any other pair of tools a shifting spanner versus a socket wrench, say neither is better than the other Moreover, MATLAB and Python are similar in many respects Both are high productivity scripting languages, with slow loops offset by fast vectorized operations Both have excellent graphics capabilities, and a long list of libraries for scientic computing Nonetheless, Python has some important strengths that are driving its rapid uptake in scientic computing Open Source One obvious advantage is that Python is free and open source When you start out with Python, the free component of this pair will probably be the most appealing It means that you, your coauthors and your students can install Python and any of its libraries on all of your computers without cost, or having to bother about licenses Over time, however, you will most likely come to value the open source property of Python as much, if not more The rst advantage of open source libraries is that you can read them For example, lets say you want to know exactly how pandas computes Newey-West covariance matrices

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.1. ABOUT PYTHON

21

No problem: You can go ahead and read the code While dipping into external library code might seem daunting at rst, its very useful for 1. Helping you understand the details of a particular implementation 2. Building your programming skills by showing you code written by rst rate programmers A second advantage of open source libraries is that you can change them In particular, if the functionality provided by a given library is not exact what you want, you can always open up your trusty editor and modify it This can also seem daunting to beginners, but with time you will nd that it can be extremely useful A third advantage of open source libraries is that active projects tend to respond fast to new demand If, for example, Microsoft modies the format of its Excel les, new Python readers and decoders will rapidly appear A fourth, more philosophical advantage of open source software is that it conforms to the scientic ideal of reproducibility Since all the source code is visible, research you produce using Python will be more open, more transparent and hence more reproducible Flexibility and Broad Scope As mentioned above, Python is a general purpose programming language with a wide range of applications As such, it can be used for just about any task you are likely to face, from dynamic programming to web scraping, forecasting with support vector machines, sentiment analysis via Twitter, building a graphical front end for an experiment, or sending yourself emails to remind you of your mothers birthday Moreover, Python has a vibrant and friendly community, and a massive array of third party libraries for almost any purpose To learn more, you might like to Browse some Python projects on GitHub Have a look at some of the IPython notebooks people have shared on various scientic topics Visit the Python Package Index View some of the question people are asking about Python on Stackoverow Keep up to date on whats happening in the Python community with the Python subreddit Etc. Syntax and Design Another nice feature of Python is its elegant syntax well see many examples later on

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.2. SETTING UP YOUR PYTHON ENVIRONMENT

22

Elegant code might sound superuous, but in fact its highly benecial because it makes the syntax easy to read and easy to remember Remembering how to read from les, sort dictionaries and other such routine tasks means that you dont need to break your ow of thought in order to hunt down correct syntax on the Internet Closely related to elegant syntax is elegant design Features like iterators, generators, decorators, list comprehensions, etc. make Python highly expressive, allowing you to get more done with less code Namespaces improve productivity by cutting down on bugs and syntax errors

How About Julia?


Julia is an open source language for scientic computing launched in early 2012 Julia is similar to MATLAB and Python in the sense that it is a dynamically typed scripting language suitable for rapid development Julia also offers the promise of fast loops by using a just-in-time compiler We are excited about Julia, and believe it will turn into a very useful tool On the other hand, Julia is still well below version 1, and hence might break backwards compatibility at some stage over the coming year or two (This is a good thing its not good for a project to lock down its design too early) In addition, Julia lacks the massive scientic programming ecosystem that has gathered around Python We will wait until Julia has developed a bit more, and then consider treating Julia in our lectures alongside Python Additional comments: Work has already begun on integrating Julia and Python, so that Julia code can be called from Python code and vice versa Fast loops are already available in Python via Numba or Parakeet, although these project are still works in progress

2.2 Setting up Your Python Environment


The objective of this lecture is to help you 1. Get a Python environment up and running with all the necessary tools 2. Install the Python programs that underpin these lectures 3. Make sure that you know how to run these programs with Python 4. Learn how to modify them, and also write your own scripts

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.2. SETTING UP YOUR PYTHON ENVIRONMENT

23

Warning: The core Python package is easy to install, but is most likely not what you should choose for these lectures. The reason is that these lectures require the entire scientic programming ecosystem, which the core installation doesnt provide. Please read the following carefully. If you have problems with what follows, feel free to contact us via john.stachurski@gmail.com we might not be able to help, but well try to give suggestions and improve these notes from your feedback

Installation
Lets start with the most standard set-up: an installation of all the necessary bits and pieces onto the machine sitting in front of you The key components to install are 1. Python and its scientic libraries 2. A decent text editor, for editing Python scripts 3. The Python les for this course For now well assume you are running Windows or Mac OS theres a brief discussion of Linux below Python Distributions For Windows and Mac OS, the best thing to do is to install one of the distributions that contains Python and the scientic librarires The three best known are Anaconda Canopy Python(x,y) (Windows only) All are free, or have free versions Opinions differ as to which distribution is the best Lately we hear good things about Anaconda, and Anaconda works on all platforms, so were going to recommend it as the one you should install Note: Some Windows users have reported nding Python(x,y) to be more straightforward and more stable. If you are using Windows and have problems with Anaconda, please try uninstalling Anaconda and installing Python(x,y). You might like to start the download process now, making sure you choose the right version for your operating system For Anaconda, note:

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.2. SETTING UP YOUR PYTHON ENVIRONMENT

24

If you already have some other Python distro installed, you might be better off uninstalling before you start If you are asked during installation process whether youd like to make Anaconda your default Python installation, go ahead and say yes Otherwise, you can accept all of the defaults Text Editors The next thing youll need to install is a decent text editor A text editor is the application you will use to write and edit Python programs Perhaps you already have a favorite text editor that knows how to interact with Python One that we recommend is Sublime Text, a popular and highly regarded text editor with a relatively moderate learning curve Sublime Text is not free, but it does have an unlimited trial period, so you can take your time and see if you like it However, there are many others, and a lot of them are free you can nd out more by googling for Python text editors If you want a top quality free editor and dont mind a sharper learning curve, try Emacs If you want an outstanding free text editor and dont mind a seemingly vertical learning curve plus long days of pain and suffering while all your neural pathways are rewired, try Vim Git Another piece of software we suggest you install is Git Git is a tool for managing collections of les, typically software libraries Very often, these libaries of code called repositories are stored on GitHub, a free hosting service Our repository is no exception you can nd it here There are two ways to download a repository from GitHub either using Git or just downloading the zip le If youre happy with the latter, then you can go ahead youll see the Download ZIP button on the right-hand side of the main page (Make sure you remember where you unzip the directory, and make it somewhere you can easily navigate to) However, learning about Git is an excellent investment, and we recommend that you go ahead and download it Note: We are pointing you to the plain vanilla command line version of Git, since we nd it best ourselves. If you use Windows and have trouble with the instructions below, one option is to use the fancier GUI version you can nd here.

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.2. SETTING UP YOUR PYTHON ENVIRONMENT

25

Obtaining the Main Repository As discussed above, if you denitely dont want to learn Git, then you can just download the zip le containing the main repository, and open it in a sensible place (In all of what follows, the term main repository always refers to our GitHub repository) Alternatively, assuming that youve installed Git, you can do the following Open up a terminal (MacOS) or Powershell (Windows) Heres a picture of the Powershell you can click on it to enlarge

If you look closely, youll see that weve typed two commands at the prompt The rst was cd .\Documents in fact we just typed cd Doc and hit the Tab key, and Powershell guessed the rest (cd is change directory) The purpose of this command was to take us to a reasonable location to put the main repository The next command we typed was
git clone https://github.com/jstac/quant-econ

This looks complicated, but its just git clone in front of the URL for our main repository Note: Did you get an error message? Are you using Windows? It might be that Powershell cant nd Git. If so, please follow these instructions. (Thanks to Tom Ward for the link.) In response to this command we see some output, and if we now list the current directory (type ls) we see a new directory called quant-econ Now lets enter that directory, via cd quant-econ (or cd qua and then Tab), and then the subdirectory programs, via cd programs (cd pr and then Tab) Heres the picture so far (click to enlarge) Now if you type ls for list, you should see a whole lot of Python les it means youre done installing the repository

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.2. SETTING UP YOUR PYTHON ENVIRONMENT

26

As one nal step, try typing git pull (without doing anything else rst, like changing directories) Git will tell you that you are already up to date, but if there had been updates to the repository, then they would have been pulled in In general, you should type git pull each time before you start work

Running Python Programs


Now youre ready to run one of the Python programs from the main repository Well assume that you are where we just left off with Powershell or a terminal open and currently in the programs subdirectory of quant-econ To run Python programs were going to use IPython, which is a much improved version of the plain Python command shell First Steps with IPython Now type ipython, and, assuming your installation of Anaconda has nished, you should see something like this (as usual, click to enlarge)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.2. SETTING UP YOUR PYTHON ENVIRONMENT

27

Now try typing import numpy, import scipy, import matplotlib and import pandas one after another, as follows

A lack of error messages indicates that installation of the main scientic libraries has succeeded Now, if you look at the last gure, youll see that the nal command is run white_noise_plot.py The le white_noise_plot.py contains a very simple program from the quant-econ repository If you run it via run white_noise_plot.py, you should see a gure pop up with a line plot of some white noise, like so

If you got an error message, the most likely cause is that you are in the wrong directory Try typing pwd at the IPython prompt (pwd = present working directory) Where you want to be is in the programs subdirectory of quant-econ If you arent, use cd .. (go back one level), pwd (tell me where I am) and cd <directory_name_here> (enter desired directory) until youre back in programs T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.2. SETTING UP YOUR PYTHON ENVIRONMENT

28

Now the command run white_noise_plot.py should work Lets recap what weve learned about IPython so far 1. run foo.py runs the program foo.py, provided it is in the present working directory 2. Navigation commands such as cd, pwd and ls work just as well inside IPython as they do in the terminal / Powershell Now lets explore some other important features Command History Type any command, such as print 'foo' Now hit the up arrow key, and then return Exactly the same thing should come up the up arrow brings the previously typed command to the prompt This saves a lot of typing Tab Completion Another nice feature of IPython is tab completion For example, enter import numpy and then type from numpy import ran and, without hitting Return, hit the Tab key IPython should offer up the only two possible completions, random and rank If you type from numpy import rand and then Tab, the last word will expand to random, since this is the only possibility In this way, the Tab key helps remind you of whats available, and also saves you plenty of typing On-Line Help At the IPython prompt, try typing max? You should see the on-line help for the max function displayed More generally, name? brings up help on name, provided that IPython knows what name is Sometimes, when the help page is long, you will be dropped into a pager from which you can exit by typing q For example, try import numpy and then numpy.random.randn? Putting it All Together Weve come a long way weve installed all the software we need, and are able to run programs The other major thing we want to accomplish in this lecture is to learn how to modify programs, or write our own This is actually pretty easy: just launch your text editor and then open up the relevant le Here weve started Sublime Text and then opened the le white_noise_plot.py from the main repository (click to enlarge) What you see here is the standard scientic Python programming set up: T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.2. SETTING UP YOUR PYTHON ENVIRONMENT

29

A text editor to edit your programs The IPython shell to run them Heres another example, using Vim on a Linux box

Try replicating this text editor / IPython set up and running white_noise_plot.py Now try changing the b- to r- in the line plot(x, 'b-', label="white noise") Save the le, and then run again with IPython The blue line from the white noise gure should now turn red

How About Linux?


So far our discussion has centered around Windows and Mac OS If you use Ubuntu Linux, say, then the installation procedure is very similar, except that you can probably install all the software you need with apt-get T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

30

For example, start with


sudo apt-get install python-scipy

and then install pandas, matplotlib, git, etc. in the same way If you nd that you need more up-to-date versions of these packages, consider using virtualenv and pip

Exercises
Exercise 1 Sign up to GitHub its free Look into forking GitHub repositories (Loosely speaking, forking means making your own copy of a GitHub repository, stored on GitHub) Try forking the main repository for the course Now try cloning it to some local directory, making edits, adding and committing them, and pushing them back up to your forked GitHub repo See here for help

2.3 An Introductory Example


Were now ready to start learning the Python language itself, and the next few lectures are devoted to this task Our approach is aimed at those who already have at least some knowledge of fundamental programming concepts, such as variables for loops, while loops conditionals (if/else) Dont give up if you have no programming experienceyou are not excluded You just need to cover some of the fundamentals of programming before returning here Two good references for rst time programmers are Learn Python the Hard Way How to Think Like a Computer Scientist the rst 5 or 6 chapters

Overview of This Lecture


In this lecture we will write and then pick apart small Python programs

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

31

The objective is to introduce you to basic Python syntax and data structures Deeper conceptshow things workwill be covered in later lectures In reading the following, you should be conscious of the fact that all rst programs are to some extent contrived We try to avoid this, but nonetheless Be aware that the programs are written to illustrate certain concepts By the time you nish the course, you will be writing the same programs in a rather differentand more efcientway In particular, the scientic libraries will allow us to accomplish the same things much faster and more efciently, once we know how to use them However, you also need to learn pure Python, the core language This is the objective of the present lecture, and the next few lectures too Prerequisites: You should already know How to get a copy of the programs written for this course How to run these (or any other) Python programs through the Python interpreter If youre not sure about either, then please return to this lecture

First Example: Plotting a White Noise Process


To begin, lets suppose that we want to simulate and plot the white noise process where each draw t is independent standard normal In other words, we want to generate gures that look something like this: Heres a program that accomplishes what we want
1 2 3 4 5 6 7 8 9

0, 1, . . . , T ,

import pylab from random import normalvariate ts_length = 100 epsilon_values = [] # An empty list for i in range(ts_length): e = normalvariate(0, 1) epsilon_values.append(e) pylab.plot(epsilon_values, 'b-') pylab.show()

The program can be found in the le test_program_1.py from the main repository In brief, Lines 12 use the Python import keyword to pull in functionality from external libraries Line 3 sets the desired length of the time series

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

32

Line 4 creates an empty list called epsilon_values that will store the them

values as we generate

Line 5 tells the Python interpreter that it should cycle through the block of indented lines (lines 67) ts_length times before continuing to line 8 Lines 67 draw a new value
t

and append it to the end of the list epsilon_values

Lines 89 generate the plot and display it to the user Lets now break this down and see how the different parts work Import Statements First, consider the lines
1 2

import pylab from random import normalvariate

Here pylab and random are two separate modules A module is a le, or a hierachy of linked les, containing code that can be read by the Python interpreter Importing a module causes the Python interpreter to run the code in those les After importing a module, we can access anything dened within the module via module_name.attribute_name syntax
In [1]: import random In [2]: random.normalvariate(0, 1) Out[2]: -0.12451500570438317

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

33

In [3]: random.uniform(-1, 1) Out[3]: 0.35121616197003336

Alternatively, we can import attributes from the module directly


In [4]: from random import normalvariate, uniform In [5]: normalvariate(0, 1) Out[5]: -0.38430990243287594 In [6]: uniform(-1, 1) Out[6]: 0.5492316853602877

Both approaches are in common use Lists Next lets consider the statement epsilon_values = [], which creates an empty list Lists are a native Python data structure used to group a collection of objects. For example
In [7]: x = [10, 'foo', False] In [8]: type(x) Out[8]: list # We can include heterogeneous data inside a list

Here the rst element of x is an integer, the next is a string and the third is a Boolean value When adding a value to a list, we can use the syntax list_name.append(some_value)
In [9]: x Out[9]: [10, 'foo', False] In [10]: x.append(2.5) In [11]: x Out[11]: [10, 'foo', False, 2.5]

Here append() is whats called a method, which is a function attached to an objectin this case, the list x Well learn all about methods later on, but just to give you some idea, Python objects such as lists, strings, etc. all have methods that are used to manipulate the data contained in the object String objects, have string methods, list objects have list methods, etc. Another useful list method is pop()
In [12]: x Out[12]: [10, 'foo', False, 2.5] In [13]: x.pop() Out[13]: 2.5

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

34

In [14]: x Out[14]: [10, 'foo', False]

The full set of list methods can be found here Following C, C++, Java, etc., lists in Python are zero based
In [15]: x Out[15]: [10, 'foo', False] In [16]: x[0] Out[16]: 10 In [17]: x[1] Out[17]: 'foo'

Returning to test_program_1.py above, we actually create a second list besides epsilon_values In particular, line 5 calls the range() function, which creates sequential lists of integers
In [18]: range(4) Out[18]: [0, 1, 2, 3] In [19]: range(5) Out[19]: [0, 1, 2, 3, 4]

The For Loop Now lets consider the for loop in test_program_1.py, which we repeat here for convenience, along with the line that follows it
for i in range(ts_length): e = normalvariate(0, 1) epsilon_values.append(e) pylab.plot(epsilon_values, 'b-')

The for loop causes Python to execute the two indented lines a total of ts_length times before moving on These two lines are called a code block, since they comprise the block of code that we are looping over Unlike most other languages, Python knows the extent of the code block only from indentation In particular, the fact that indentation decreases after line epsilon_values.append(e) tells Python that this line marks the lower limit of the code block More on indentation belowfor now lets look at another example of a for loop
animals = ['dog', 'cat', 'bird'] for animal in animals: print("The plural of " + animal + " is " + animal + "s")

If you put this in a text le and run it you will see

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

35

The plural of dog is dogs The plural of cat is cats The plural of bird is birds

This example helps to clarify how the for loop works: When we execute a loop of the form
for variable_name in sequence: <code block>

The Python interpreter performs the following: For each element of sequence, it binds the name variable_name to that element and then executes the code block The sequence object can in fact be a very general object, as well see soon enough Code Blocks and Indentation In discussing the for loop, we explained that the code blocks being looped over are delimited by indentation In fact, in Python all code blocks (i.e., those occuring inside loops, if clauses, function denitions, etc.) are delimited by indentation Thus, unlike most other languages, whitespace in Python code affects the output of the program Once you get used to it, this is a very good thing because it forces clean, consistent indentation, which improves readability removes clutter, such as the brackets or end statements used in other languages On the other hand, it takes a bit of care to get right, so please remember: The line before the start of a code block always ends in a colon for i in range(10): if x > y: while x < 100: etc., etc. All lines in a code block must have the same amount of indentation The Python standard is 4 spaces, and thats what you should use Tabs vs Spaces One small gotcha here is the mixing of tabs and spaces (Important: Within text les, the internal representation of tabs and spaces is not the same) You can use your Tab key to insert 4 spaces, but you need to make sure its congured to do so Heres the relevant sublime text documentation Heres a screenshot of correct tab conguration for the Gedit text editor

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

36

While Loops The for loop is the most common technique for iteration in Python But, for the purpose of illustration, lets modify test_program_1.py to use a while loop instead In Python, the while loop syntax is as shown in the le test_program_2.py below
1 2 3 4 5 6 7 8 9 10 11

import pylab from random import normalvariate ts_length = 100 epsilon_values = [] i = 0 while i < ts_length: e = normalvariate(0, 1) epsilon_values.append(e) i = i + 1 pylab.plot(epsilon_values, 'b-') pylab.show()

The output of test_program_2.py is identical to test_program_1.py above (modulo randomness) Comments: The code block for the while loop is lines 79, again delimited only by indentation The statement i = i + 1 can be replaced by i += 1 User-Dened Functions Now lets go back to the for loop, but restructure our program to make the logic clearer T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

37

To this end, we will break our program into two parts: 1. A user-dened function that generates a list of random variables 2. The main part of the program that (a) calls this function to get data (b) plots the data This is accomplished in test_program_3.py
1 2 3 4 5 6 7 8 9 10 11 12 13

import pylab from random import normalvariate def generate_data(n): epsilon_values = [] for i in range(n): e = normalvariate(0, 1) epsilon_values.append(e) return epsilon_values data = generate_data(100) pylab.plot(data, 'b-') pylab.show()

Lets go over this carefully, in case youre not familiar with functions and how they work We have dened a function called generate_data(), where the denition spans lines 49 def on line 4 is a Python keyword used to start function denitions def generate_data(n): indicates that the function is called generate_data, and that it has a single argument n Lines 59 are a code block called the function bodyin this case it creates an iid list of random draws using the same logic as before Line 9 indicates that the list epsilon_values is the object that should be returned to the calling code This whole function denition is read by the Python interpreter and stored in memory When the interpreter gets to the expression generate_data(100) in line 12, it executes the function body (lines 59) with n set equal to 100. The net result is that the name data on the left-hand side of line 12 is set equal to the list epsilon_values returned by the function Conditions Our function generate_data() is rather limited Lets make it slightly more useful by giving it the ability to return either standard normals or uniform random variables on (0, 1) as required This is achieved in test_program_4.py by adding the argument generator_type to generate_data()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

import pylab from random import normalvariate, uniform def generate_data(n, generator_type): epsilon_values = [] for i in range(n): if generator_type == 'U': e = uniform(0, 1) else: e = normalvariate(0, 1) epsilon_values.append(e) return epsilon_values data = generate_data(100, 'U') pylab.plot(data, 'b-') pylab.show()

Comments: Hopefully the syntax of the if/else clause is self-explanatory, with indentation again delimiting the extent of the code blocks We are passing the argument U as a string, which is why we write it as 'U' Notice that equality is tested with the == syntax, not = For example, the statement a = 10 assigns the name a to the value 10 The expression a == 10 evaluates to either True or False, depending on the value of a Now, there are two ways that we can simplify test_program_4 First, Python accepts the following conditional assignment syntax
In [20]: x = -10 In [21]: s = 'negative' if x < 0 else 'nonnegative' In [22]: s Out[22]: 'negative'

which leads us to test_program_5.py


1 2 3 4 5 6 7 8 9 10 11 12 13

import pylab from random import normalvariate, uniform def generate_data(n, generator_type): epsilon_values = [] for i in range(n): e = uniform(0, 1) if generator_type == 'U' else normalvariate(0, 1) epsilon_values.append(e) return epsilon_values data = generate_data(100, 'U') pylab.plot(data, 'b-') pylab.show()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

39

Second, and more importantly, we can get rid of the conditionals all together by just passing the desired generator type as a function To understand this, consider test_program_6.py
1 2 3 4 5 6 7 8 9 10 11 12 13

import pylab from random import normalvariate, uniform def generate_data(n, generator_type): epsilon_values = [] for i in range(n): e = generator_type(0, 1) epsilon_values.append(e) return epsilon_values data = generate_data(100, uniform) pylab.plot(data, 'b-') pylab.show()

The only lines that have changed here are lines 7 and 11 In line 11, when we call the function generate_data(), we pass uniform as the second argument The object uniform is in fact a function, dened in the random module
In [23]: from random import uniform In [24]: uniform(0, 1) Out[24]: 0.2981045489306786

When the function call generate_data(100, uniform) on line 11 is executed, Python runs the code block on lines 59 with n equal to 100 and the name generator_type bound to the function uniform While these lines are executed, the names generator_type and uniform are synonyms, and can be used in identical ways This principle works more generallyfor example, consider the following piece of code
In [25]: max(7, 2, 4) Out[25]: 7 In [26]: m = max In [27]: m(7, 2, 4) Out[27]: 7 # max() is a built-in Python function

Here we created another name for the built-in function max(), which could then be used in identical ways In the context of our program, the ability to bind new names to functions means that there is no problem passing a function as an argument to another functionas we do in line 11

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

40

List Comprehensions Now is probably a good time to tell you that we can simplify the code for generating the list of random draws considerably by using something called a list comprehension List comprehensions are an elegant Python tool for creating lists Consider the following example, where the list comprehension is on the right-hand side of the second line
In [28]: animals = ['dog', 'cat', 'bird'] In [29]: plurals = [animal + 's' for animal in animals] In [30]: plurals Out[30]: ['dogs', 'cats', 'birds']

Heres another example


In [31]: range(8) Out[31]: [0, 1, 2, 3, 4, 5, 6, 7] In [32]: doubles = [2 * x for x in range(8)] In [33]: doubles Out[33]: [0, 2, 4, 6, 8, 10, 12, 14]

With the list comprehension syntax, we can simplify the lines


epsilon_values = [] for i in range(n): e = generator_type(0, 1) epsilon_values.append(e)

into
epsilon_values = [generator_type(0, 1) for i in range(n)]

Using the Scientic Libraries As discussed at the start of the lecture, our example is somewhat contrived In practice we would use the scientic libraries, which can generate large arrays of independent random draws much more efciently For example, try
In [34]: from numpy.random import randn In [35]: epsilon_values = randn(5) In [36]: epsilon_values Out[36]: array([-0.15591709, -1.42157676, -0.67383208, -0.45932047, -0.17041278])

Well discuss these scientic libraries a bit later on

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

41

Exercises
Exercise 1 Recall that n! is read as n factorial and dened as n! = n (n 1) 2 1 There are functions to compute this in various modules, but lets write our own version as an exercise In particular, write a function factorial such that factorial(n) returns n! for any positive integer n Solution: View solution Exercise 2 The binomial random variable Y Bin(n, p) represents the number of successes in n binary trials, where each trial succeeds with probability p Without any import besides from random import uniform, write a function binomial_rv such that binomial_rv(n, p) generates one draw of Y Hint: If U is uniform on (0, 1) and p (0, 1), then the expression U < p evaluates to True with probability p Solution: View solution Exercise 3 Compute an approximation to using Monte Carlo. Use no imports besides
from random import uniform from math import sqrt

Your hints are as follows: If U is a bivariate uniform random variable on the unit square (0, 1)2 , then the probability that U lies in a subset B of (0, 1)2 is equal to the area of B If U1 , . . . , Un are iid copies of U , then, as n gets large, the fraction that fall in B converges to the probability of landing in B For a circle, area = pi * radius^2 Solution: View solution Exercise 4 Write a program that prints one realization of the following random device: Flip an unbiased coin 10 times If 3 consecutive heads occur one or more times within this sequence, pay one dollar If not, pay nothing Use no import besides from random import uniform Solution: View solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

42

Exercise 5 Your next task is to simulate and plot the correlated time series x t +1 = x t +
t +1

where

x0 = 0

and

t = 0, . . . , T

The sequence of shocks { t } is assumed to be iid and standard normal In your solution, restrict your import statements to
from pylab import plot, show from random import normalvariate

Set T = 200 and = 0.9 Solution: View solution Exercise 6 To do the next exercise, you will need to know how to produce a plot legend The following example should be sufcient to convey the idea
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: white_noise_plot.py Authors: John Stachurski, Thomas J. Sargent LastModified: 11/08/2013 """ from pylab import plot, show, legend from random import normalvariate x = [normalvariate(0, 1) for i in range(100)] plot(x, 'b-', label="white noise") legend() show()

Running it produces a gure like so Now, starting with your solution to exercise 5, plot three simulated time series, one for each of the cases = 0, = 0.8 and = 0.98 In particular, you should produce (modulo randomness) a gure that looks as follows (The gure nicely illustrates how time series with the same one-step-ahead conditional volatilities, as these three processes have, can have very different unconditional volatilities.) In your solution, please restrict your import statements to
from pylab import plot, show, legend from random import normalvariate

Also, use a for loop to step through the values Important hints: If you call the plot() function multiple times before calling show(), all of the lines you produce will end up on the same gure

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.3. AN INTRODUCTORY EXAMPLE

43

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.4. PYTHON ESSENTIALS

44

And if you omit the argument 'b-' to the plot function, pylab will automatically select different colors for each line The expression 'foo' + str(42) evaluates to 'foo42' Solution: View solution

2.4 Python Essentials


In this lecture well cover features of the language that are essential to reading and writing Python

Overview of This Lecture


Topics: Some important data types that we havent covered yet Basic le I/O The Pythonic approach to iteration More on user-dened functions Comparisons and logic Standard Python style

More Data Types


So far weve met several common data types, including strings, integers, oats and lists Lets review some other common types Primitive Data Types One very simple data type is Boolean values, which can be either True or False
In [1]: x = True In [2]: y = 100 < 10 In [3]: y Out[3]: False In [4]: type(y) Out[4]: bool # Python evaluates expression on right and assigns it to y

In arithmetic expressions, True is converted to 1 and False is converted 0

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.4. PYTHON ESSENTIALS

45

In [5]: x + y Out[5]: 1 In [6]: x * y Out[6]: 0 In [7]: True + True Out[7]: 2 In [8]: bools = [True, True, False, True] # List of Boolean values In [9]: sum(bools) Out[9]: 3

This is called Boolean arithmetic, and we will use it a great deal Complex numbers are another primitive data type in Python
In [10]: x = complex(1, 2) In [11]: y = complex(2, 1) In [12]: x * y Out[12]: 5j

There are several more primitive data types that well introduce as necessary Containers Python has several basic types for storing collections of (possibly heterogeneous) data We have already discussed lists A related data type is tuples, which are immutable lists
In [13]: x = ('a', 'b') In [14]: x = 'a', 'b' In [15]: x Out[15]: ('a', 'b') In [16]: type(x) Out[16]: tuple # Round brackets instead of the square brackets used for lists # Or no brackets at all---the meaning is identical

In Python, an object is called immutable if, once created, the object cannot be changed Lists are mutable while tuples are not
In [17]: x = [1, 2] In [18]: x[0] = 10 In [19]: x = (1, 2) # Lists are mutable # Now x = [10, 2], so the list has "mutated" # Tuples are immutable

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.4. PYTHON ESSENTIALS

46

In [20]: x[0] = 10 --------------------------------------------------------------------------TypeError Traceback (most recent call last) <ipython-input-21-6cb4d74ca096> in <module>() ----> 1 x[0]=10 TypeError: 'tuple' object does not support item assignment

Well say more about mutable vs immutable a bit later, and explain why the distinction is important Tuples (and lists) can be unpacked as follows
In [21]: integers = (10, 20, 30) In [22]: x, y, z = integers In [23]: x Out[23]: 10 In [24]: y Out[24]: 20

Youve actually seen an example of this already Tuple unpacking is convenient and well use it often Two other container types we should mention before moving on are sets and dictionaries Dictionaries are much like lists, except that the items are named instead of numbered
In [25]: d = {'name': 'Frodo', 'age': 33} In [26]: type(d) Out[26]: dict In [27]: d['age'] Out[27]: 33

The names 'name' and 'age' are called the keys The objects that the keys are mapped to ('Frodo' and 33) are called the values Sets are unordered collections without duplicates, and set methods provide the usual set theoretic operations
In [28]: s1 = {'a', 'b'} In [29]: type(s1) Out[29]: set In [30]: s2 = {'b', 'c'} In [31]: s1.issubset(s2)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.4. PYTHON ESSENTIALS

47

Out[31]: False In [32]: s1.intersection(s2) Out[32]: set(['b'])

The set() function creates sets from sequences


In [33]: s3 = set(('foo', 'bar', 'foo')) In [34]: s3 Out[34]: set(['foo', 'bar']) # Unique elements only

Input and Output


Lets have a quick look at basic le input and output We discuss only reading and writing to text les Input and Output Lets start with writing
In [35]: f = open('newfile.txt', 'w') In [36]: f.write('Testing\n') In [37]: f.write('Testing again') In [38]: f.close() # Open 'newfile.txt' for writing # Here '\n' means new line

Here The built-in function open() creates a le object for writing to Both write() and close() are methods of le objects Where is this le that weve created? Recall that Python maintains a concept of the current working directory (cwd) that can be located by
import os print os.getcwd()

(In the IPython notebook, pwd should also work) If a path is not specied, then this is where Python writes to You can conrm that the le newfile.txt is in your cwd using a le browser or some other method (In IPython, use ls to list the les in the cwd) We can also use Python to read the contents of newline.txt as follows

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.4. PYTHON ESSENTIALS

48

In [39]: f = open('newfile.txt', 'r') In [40]: out = f.read() In [41]: out Out[41]: 'Testing\nTesting again' In [42]: print out Out[42]: Testing Testing again

Paths Note that if newfile.txt is not in the cwd then this call to open() fails In this case you can either specify the full path to the le
In [43]: f = open('insert_full_path_to_file/newfile.txt', 'r')

or change the current working directory to the location of the le via os.chdir('path_to_file') (In IPython, use cd to change directories) Details are OS specic, by a Google search on paths and Python should yield plenty of examples

Iterating
One of the most important tasks in computing is stepping through a sequence of data and performing a given action One of Pythons strengths is its simple, exible interface to this kind of iteration via the for loop Looping over Different Objects Many Python objects are iterable, in the sense that they can be placed to the right of in within a for loop statement To give an example, suppose that we have a le called us_cities.txt listing US cities and their population
new york: 8244910 los angeles: 3819702 chicago: 2707120 houston: 2145146 philadelphia: 1536471 phoenix: 1469471 san antonio: 1359758 san diego: 1326179 dallas: 1223229

Suppose that we want to make the information more readable, by capitalizing names and adding commas to mark thousands The following program reads the data in and makes the conversion T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.4. PYTHON ESSENTIALS

49

1 2 3 4 5 6 7

data_file = open('us_cities.txt', 'r') for line in data_file: city, population = line.split(':') # Tuple unpacking city = city.title() # Capitalize city names population = '{0:,}'.format(int(population)) # Add commas to numbers print(city.ljust(15) + population) data_file.close()

Here format() is a powerful string method used for inserting variables into strings The output is as follows
New York: 8,244,910 Los Angeles: 3,819,702 Chicago: 2,707,120 Houston: 2,145,146 Philadelphia: 1,536,471 Phoenix: 1,469,471 San Antonio: 1,359,758 San Diego: 1,326,179 Dallas: 1,223,229

The reformatting of each line is the result of three different string methods, the details of which can be left till later The interesting part of this program for us is line 2, which shows that 1. The le object f is iterable, in the sense that it can be placed to the right of in within a for loop 2. Iteration steps through each line in the le This leads to the clean, convenient syntax shown in our program Many other kinds of objects are iterable, and well discuss some of them later on Looping without Indices One thing you might have noticed is that Python tends to favor looping without explicit indexing For example,
for x in x_values: print x * x

is preferred to
for i in range(len(x_values)): print x_values[i] * x_values[i]

When you compare these two alternatives, you can see why the rst one is preferred Python provides some facilities to simplify looping without indices One is zip(), which is used for stepping through pairs from two sequences For example, try running the following code T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.4. PYTHON ESSENTIALS

50

countries = ('Japan', 'Korea', 'China') cities = ('Tokyo', 'Seoul', 'Beijing') for country, city in zip(countries, cities): print 'The capital of {0} is {1}'.format(country, city)

The zip() function is also useful for creating dictionaries for example
In [1]: names = ['Tom', 'John'] In [2]: marks = ['E', 'F'] In [3]: dict(zip(names, marks)) Out[3]: {'John': 'F', 'Tom': 'E'}

If we actually need the index from a list, one option is to use enumerate() To understand what enumerate() does, consider the following example
letter_list = ['a', 'b', 'c'] for index, letter in enumerate(letter_list): print "letter_list[{0}] = '{1}'".format(index, letter)

The output of the loop is


letter_list[0] = 'a' letter_list[1] = 'b' letter_list[2] = 'c'

Comparisons and Logical Operators


Comparisons Many different kinds of expressions evaluate to one of the Boolean values (i.e., True or False) A common type is comparisons, such as
In [44]: x, y = 1, 2 In [45]: x < y Out[45]: True In [46]: x > y Out[46]: False

One of the nice features of Python is that we can chain inequalities


In [47]: 1 < 2 < 3 Out[47]: True In [48]: 1 <= 2 <= 3 Out[48]: True

As we saw earlier, when testing for equality we use ==

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.4. PYTHON ESSENTIALS

51

In [49]: x = 1 In [50]: x == 2 Out[50]: False

# Assignment # Comparison

For not equal use !=


In [51]: 1 != 2 Out[51]: True

Note that when testing conditions, we can use any valid Python expression
In [52]: x = 'yes' if 42 else 'no' In [53]: x Out[53]: 'yes' In [54]: x = 'yes' if [] else 'no' In [55]: x Out[55]: 'no'

Whats going on here? The rule is: Expressions that evaluate to zero, empty sequences/containers (strings, lists, etc.) and None are equivalent to False All other values are equivalent to True Combining Expressions We can combine expressions using and, or and not These are the standard logical connectives (conjunction, disjunction and denial)
In [56]: 1 < 2 and 'f' in 'foo' Out[56]: True In [57]: 1 < 2 and 'g' in 'foo' Out[57]: False In [58]: 1 < 2 or 'g' in 'foo' Out[58]: True In [59]: not True Out[59]: False In [60]: not not True Out[60]: True

Remember P and Q is True if both are True, else False P or Q is False if both are False, else True

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.4. PYTHON ESSENTIALS

52

More Functions
Lets talk a bit more about functions, which are all-important for good programming style Python has a number of built-in functions that are available without import We have already met some
In [61]: max(19, 20) Out[61]: 20 In [62]: range(4) Out[62]: [0, 1, 2, 3] In [63]: str(22) Out[63]: '22' In [64]: type(22) Out[64]: int

Two more useful built-in functions are any() and all()


In [65]: bools = False, True, True In [66]: all(bools) # True if all are True and False otherwise Out[66]: False In [67]: any(bools) # False if all are False and True otherwise Out[67]: True

The full list of Python built-ins is here Now lets talk some more about user-dened functions constructed using the keyword def Why Write Functions? User dened functions are important for improving the clarity of your code by separating different strands of logic facilitating code reuse (Writing the same thing twice is always a bad idea) The basics of user dened functions were discussed here <user_dened_functions> The Flexibility of Python Functions As we discussed in the previous lecture, Python functions are very exible In particular Any number of functions can be dened in a given le Any object can be passed to a function as an argument, including other functions Functions can be (and often are) dened inside other functions T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.4. PYTHON ESSENTIALS

53

A function can return any kind of object, including functions We already gave an example of how straightforward it is to pass a function to a function Note that a function can have arbitrarily many return statements (including zero) Execution of the function terminates when the rst return is hit, allowing code like the following example
def f(x): if x < 0: return 'negative' return 'nonnegative'

Functions without a return statement automatically return the special Python object None Docstrings Python has a system for adding comments to functions, modules, etc. called docstrings The nice thing about docstrings is that they are available at run-time For example, lets say that this code resides in le temp.py
# Filename: temp.py def f(x): """ This function squares its argument """ return x**2

After it has been run in the IPython shell, the docstring is available as follows
In [1]: run temp.py In [2]: f? Type: function String Form:<function f at 0x2223320> File: /home/john/temp/temp.py Definition: f(x) Docstring: This function squares its argument In [3]: f?? Type: function String Form:<function f at 0x2223320> File: /home/john/temp/temp.py Definition: f(x) Source: def f(x): """ This function squares its argument """ return x**2

With one question mark we bring up the docstring, and with two we get the source code as well

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.4. PYTHON ESSENTIALS

54

One-Line Functions: lambda The lambda keyword is used to create simple functions on one line For example, the denitions
def f(x): return x**3

and
f = lambda x: x**3

are entirely equivalent To see why lambda is useful, suppose that we want to calculate high-school calculus)
2 0

x3 dx (and have forgotten our

The SciPy library has a function called quad that will do this calculation for us The syntax of the quad function is quad(f, a, b) where f is a function and a and b are numbers To create the function f ( x ) = x3 we can use lambda as follows
In [68]: from scipy.integrate import quad In [69]: quad(lambda x: x**3, 0, 2) Out[69]: (4.0, 4.440892098500626e-14)

Here the function created by lambda is said to be anonymous, because it was never given a name Keyword Arguments If you did the exercises in the previous lecture, you would have come across the statement
plot(x, 'b-', label="white noise")

In this call to Pylabs plot function, notice that the last argument is passed in name=argument syntax This is called a keyword argument, with label being the keyword Non-keyword arguments are called positional arguments, since their meaning is determined by order plot(x, 'b-', label="white noise") is different from plot('b-', x, label="white noise") Keyword arguments are particularly useful when a function has a lot of arguments, in which case its hard to remember the right order You can adopt keyword arguments in user dened functions with no difculty The next example illustrates the syntax
def f(x, coefficients=(1, 1)): a, b = coefficients return a + b * x

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.4. PYTHON ESSENTIALS

55

After running this code we can call it as follows


In [71]: f(2, coefficients=(0, 0)) Out[71]: 0 In [72]: f(2) Out[72]: 3 # Use default values (1, 1)

Notice that the keyword argument values we supplied in the denition of f become the default values

Coding Style and PEP8


To learn more about the Python programming philosophy type import this at the prompt Among other things, Python strongly favors consistency in programming style Weve all heard the saying about consistency and little minds In programming, as in mathematics, quite the opposite is true A mathematical paper where the symbols and were reversed would be very hard to read, even if the author told you so on the rst page In Python, the style that all good programs follow is set out in PEP8 We recommend that you slowly learn it, and following it in your programs

Exercises
Exercise 1 Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute their inner product using zip() Part 2: In one line, count the number of even numbers in 0,...,99 Hint: x % 2 returns 0 if x is even, 1 otherwise Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of pairs (a, b) such that both a and b are even Solution: View solution Exercise 2 Consider the polynomial p ( x ) = a0 + a1 x + a2 x 2 + a n x n =

i =0

ai x i

(2.1)

Write a function p such that p(x, coeff) that computes the value in (2.1) given a point x and a list of coefcients coeff Try to use enumerate() in your loop Solution: View solution T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.5. OBJECT ORIENTED PROGRAMMING

56

Exercise 3 Write a function that takes a string as an argument and returns the number of capital letters in the string Hint: 'foo'.upper() returns 'FOO' Solution: View solution Exercise 4 Write a function that takes two sequences seq_a and seq_b as arguments and returns True if every element in seq_a is also an element of seq_b, else False By sequence we mean a list, a tuple or a string Do the exercise without using sets and set methods Solution: View solution Exercise 5 When we cover the numerical libraries, we will see they include many alternatives for interpolation and function approximation Nevertheless, lets write our own function approximation routine as an exercise In particular, without using any imports, write a function linapprox that takes as arguments A function f mapping some interval [ a, b] into R two scalars a and b providing the limits of this interval An integer n determining the number of grid points A number x satisfying a <= x <= b and returns the piecewise linear interpolation of f at x, based on n evenly spaced grid points a = point[0] < point[1] < ... < point[n-1] = b Aim for clarity, not efciency Solution: View solution

2.5 Object Oriented Programming


Overview of This Lecture
OOP is one of the major paradigms in programming, and nicely supported in Python OOP has become an important concept in modern software engineering because It can help facilitate clean, efcient code (when used well) The OOP design pattern ts well with the human brain OOP is all about how to organize your code This topic is important! Proper organization of code is a critical determinant of productivity Moreover, OOP is a part of Python, and to progress further its necessary to understand the basics T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.5. OBJECT ORIENTED PROGRAMMING

57

About OOP
OOP is supported in many programming languages: Python supports both procedural and object-oriented programming JAVA and Ruby are relatively pure OOP Fortran and MATLAB are mainly procedural, but with some OOP recently tacked on C is a procedural language, while C++ is C with OOP added on top Lets look at general concepts before we specialize to Python Key Concepts The traditional (non-OOP) paradigm is called procedural, and works as follows The program has a state that contains the values of its variables Functions are called to act on these data according to the task Data are passed back and forth via function calls In contrast, in the OOP paradigm, data and functions are bundled together into objects An example is a Python list, which not only stores data, but also knows how to sort itself, etc.
In [1]: x = [1, 5, 4] In [2]: x.sort() In [3]: x Out[3]: [1, 4, 5]

Here sort is a function that is part of the list object In the OOP setting, functions are usually called methods (e.g., sort is a list method) Standard Terminology A class denition is a blueprint for a particular class of objects (e.g., lists, strings or complex numbers) It describes What kind of data the class stores What methods it has for acting on this data An object or instance is a realization of the class, created from the blueprint Each instance has its own unique data Methods set out in the class denition act on this (and other) data In Python, the data and methods of an object are collectively referred to as attributes Attributes are accessed via dotted attribute notation object_name.data T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.5. OBJECT ORIENTED PROGRAMMING

58

object_name.method_name() In the example


In [4]: x = [1, 5, 4] In [5]: x.sort() In [6]: x.__class__ Out[6]: list

x is an object or instance, created from the denition for Python lists, but with its own particular data x.sort() and x.__class__ are two attributes of x dir(x) can be used to view all the attributes of x Another Example Lets look at an example of object-oriented design, this time from a third party module Python can be used to send, receive and organize email through low-level libraries that interact with mail servers The envelopes module by Tomek Wojcik provides a nice high-level interface to these kinds of tasks In the module, emails are represented as objects that contain data (recipient list, subject, attachments, body, etc.) possess methods that act on this and other data (add attachments, send the email, etc.) Heres an example of usage provided by the developer
from envelopes import Envelope envelope = Envelope( from_addr=(u'from@example.com', u'From Example'), to_addr=(u'to@example.com', u'To Example'), subject=u'Envelopes demo', text_body=u"I'm a helicopter!") envelope.add_attachment('/Users/bilbo/Pictures/helicopter.jpg') envelope.send('smtp.googlemail.com', login='from@example.com', password='password', tls=True)

Here Envelope is a class, and the 6 lines of code starting envelope = 1. generate an instance of this class, containing instance data on sender, destination, etc. 2. bind the name envelope to the instance (If you are interested, the class denition for Envelope can be found here) The following two lines call the envelope methods add_attachment and send, the purpose of which is clear T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.5. OBJECT ORIENTED PROGRAMMING

59

Why is OOP Useful? OOP is useful for the same reason that abstraction is useful: for recognizing and organizing common phenomena E.g., abstracting certain asymmetric information problems leads to the theory of principles and agents For an example more relevant to OOP, consider the open windows on your desktop Windows have common functionality and individual data, which makes them suitable for implementing with OOP individual data: contents of specic windows common functionality: closing, maximizing, etc. Your window manager almost certainly uses OOP to generate and manage these windows individual windows created as objects / instances from a class denition, with their own data common functionality implemented as methods, which all of these objects share Another, more prosaic, use of OOP is data encapsulation Data encapsulation means storing variables inside some structure so that they are not directly accessible The alternative to this is lling the global namespace with variable names, frequently leading to conicts Think of the global namespace as any name you can refer to without a dot in front of it For example, the modules os and sys both dene a different attribute called path The following code leads immediately to a conict
from os import path from sys import path

At this point, both variables have been brought into the global namespace, and the second will shadow the rst A better idea is to replace the above with
import os import sys

and then reference the path you want with either os.path or sys.path In this example, we see that modules provide one means of data encapsulation As will now become clear, OOP provides another

Dening Your Own Classes


As a rst step we are going to try dening very simple classes, the main purpose of which is data encapsulation T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.5. OBJECT ORIENTED PROGRAMMING

60

Suppose that we are solving an economic model, and one small part of the model is a rm, characterized by a production function f (k ) = k0.5 a discount factor = 0.99 a borrowing constraint = 10 One option is to declare all these as global variables A nicer one is to have a single firm object as the global variable, and f , , accessible as firm.f, firm.beta, firm.kappa We can do this very easily by running the following code
class Firm: pass # In Python, "pass" essentially means do nothing firm = Firm() firm.f = lambda k: k**0.5 firm.beta = 0.99 firm.kappa = 10

Here The rst two lines form the simplest class denition possible in Pythonin this case called Firm The third line creates an object called firm as an instance of class Firm The last three lines dynamically add attributes to the object firm Data and Methods The Firm example is only barely OOPin fact you can do the same kind of thing with a MATLAB class or C struct Usually classes also dene methods that act on the data contained by the object For example, the list method sort() in x.sort() Lets try to build something a bit closer to this standard conception of OOP Since the notation used to dene classes seems complex on rst pass, we will start with a very simple (and rather contrived) example In particular, lets build a class to represent dice The data associated with a given dice will be the side facing up The only method will be a method to roll the dice (and hence change the state) The following is psuedocodea class denition in a mix of Python and plain English
class Dice: data: current_face -- the side facing up (i.e., number of dots showing)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.5. OBJECT ORIENTED PROGRAMMING

61

methods: roll -- roll the dice (i.e., change current_face)

Heres actual Python code, in le dice.py


""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: dice.py Authors: John Stachurski, Thomas J. Sargent LastModified: 11/08/2013 """ import random class Dice: faces = (1, 2, 3, 4, 5, 6) def __init__(self): self.current_face = 1 def roll(self): self.current_face = random.choice(Dice.faces)

Theres some difcult notation here, but the broad picture is as follows: The faces variable is a class attribute it will be shared by every member of the class (i.e., every dice) The current_face variable is an instance attribute each dice we create will have its own version The __init__ method is a special method called a constructor Used to create instances (objects) from the class denition, with their own data The roll method rolls the dice, changing the state of a particular instance Once weve run the program, the class denition is loaded into memory
In [7]: Dice Out[7]: __main__.Dice In [8]: dir(Dice) Out[8]: ['__doc__', '__init__', '__module__', 'faces', 'roll'] In [9]: Dice.faces Out[9]: (1, 2, 3, 4, 5, 6)

Lets now create two dice


In [10]: d = Dice() In [11]: e = Dice()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.5. OBJECT ORIENTED PROGRAMMING

62

These two statements implicitly call the __init__ method to build two instances of Dice When we roll each dice, the roll method will only affect the instance variable of that particular instance
In [12]: d.roll() In [13]: d.current_face Out[13]: 2 In [14]: e.roll() In [15]: e.current_face Out[15]: 5

Perhaps the most difcult part of all of this notation is the self keyword in the Dice class denition The simplest way to think of it is that self refers to a particular instance If we want to refer to instance variables, as opposed to class or global variables, then we need to use self In addition, we need to put self as the rst argument to every method dened in the class Further Details You might want to leave it at that for now, but if you still want to know more about self, here goes Consider the method call d.roll() This is in fact translated by Python into to the call Dice.roll(d) So in fact we are calling method roll() dened in class object Dice with instance d as the argument Hence, when roll() executes, self is bound to d In this way, self.current_face = random.choice(faces) affects d.current_face, which is what we want Example 2: The Quadratic Map Lets look at one more example The quadratic map difference equation is given by x t +1 = 4 (1 x t ) x t , x0 [0, 1] given (2.2)

Lets write a class for generating time series, where the data record the current location of the state xt Heres one implementation, in le quadmap_class.py
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: quadmap_class.py Authors: John Stachurski, Thomas J. Sargent LastModified: 11/08/2013

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.5. OBJECT ORIENTED PROGRAMMING

63

""" class QuadMap: def __init__(self, initial_state): self.x = initial_state def update(self): "Apply the quadratic map to update the state." self.x = 4 * self.x * (1 - self.x) def generate_series(self, n): """ Generate and return a trajectory of length n, starting at the current state. """ trajectory = [] for i in range(n): trajectory.append(self.x) self.update() return trajectory

Heres an example of usage, after running the code


In [16]: q = QuadMap(0.2) In [17]: q.x Out[17]: 0.2 In [18]: q.update() In [19]: q.x Out[19]: 0.64000000000000012 In [20]: q.generate_series(3) Out[20]: [0.64000000000000012, 0.92159999999999986, 0.28901376000000045]

Special Methods
Python provides certain special methods with which a number of neat tricks can be performed For example, recall that lists and tuples have a notion of length, and this length can be queried via the len function
In [21]: x = (10, 20) In [22]: len(x) Out[22]: 2

If you want to provide a return value for the len function when applied to your user-dened object, use the __len__ special method T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.5. OBJECT ORIENTED PROGRAMMING

64

class Foo: def __len__(self): return 42

Now we get
In [23]: f = Foo() In [24]: len(f) Out[24]: 42

A special method we will use regularly is the __call__ method This method can be used to make your instances callable, just like functions
class Foo: def __call__(self, x): return x + 42

After running we get


In [25]: f = Foo() In [26]: f(8) Out[26]: 50 # Exactly equivalent to f.__call__(8)

Exercise 1 provides a more useful example

Exercises
Exercise 1 The empirical cumulative distribution function (ecdf) corresponding to a sample { Xi }in=1 is dened as 1 n Fn ( x ) := 1{ Xi x } ( x R) (2.3) n i =1 Here 1{ Xi x } is an indicator function (one if Xi x and zero otherwise) and hence Fn ( x ) is the fraction of the sample that falls below x The GlivenkoCantelli Theorem states that, provided that the sample is iid, the ecdf Fn converges to the true distribution function F Implement Fn as a class called ecdf, where A given sample { Xi }in=1 are the instance data, stored as self.observations The class implements a __call__ method that returns Fn ( x ) for any x Your code should work as follows (modulo randomness)
In [28]: from random import uniform In [29]: samples = [uniform(0, 1) for i in range(10)]

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

65

In [30]: F = ecdf(samples) In [31]: F(0.5) Out[31]: 0.29 # Evaluate ecdf at x = 0.5

In [32]: F.observations = [uniform(0, 1) for i in range(1000)] In [33]: F(0.5) Out[33]: 0.479

Solution: View solution Aim for clarity, not efciency Exercise 2 In an earlier exercise, you wrote a function for evaluating polynomials This exercise is an extension, where the task is to build a simple class called Polynomial for representing and manipulating polynomial functions such as p ( x ) = a0 + a1 x + a2 x 2 + a N x N =

n =0

an x n

( x R)

(2.4)

The instance data for the class Polynomial will be the coefcients (in the case of (2.4), the numbers a0 , . . . , a N ) Provide methods that 1. Evaluate the polynomial (2.4), returning p( x ) for any x 2. Differentiate the polynomial, replacing the original coefcients with those of its derivative p Avoid using any import statements Solution: View solution

2.6 How it Works: Data, Variables and Names


Overview of This Lecture
The objective of the lecture is to provide deeper understanding of Pythons execution model Understanding these details is important for writing larger programs You should feel free to skip this material on rst pass and continue on to the applications We provide this material mainly as a reference, and for returning to occasionally to build your Python skills

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

66

Objects
We discussed objects briey in the previous lecture Objects are usually thought of as instances of some class denition, typically combining both data and methods (functions) For example
In [1]: x = ['foo', 'bar']

creates (an instance of) a list, possessing various methods (append, pop, etc.) In Python everything in memory is treated as an object This includes not just lists, strings, etc., but also less obvious things, such as functions (once they have been read into memory) modules (ditto) les opened for reading or writing integers, etc. At this point it is helpful to have a clearer idea of what an object is in Python In Python, an object is a collection of data and instructions held in computer memory that consists of 1. a type 2. some content 3. a unique identity 4. zero or more methods These concepts are discussed sequentially in the remainder of this section Type Python understands and provides for different types of objects, to accommodate different types of data The type of an object can be queried via type(object_name) For example
In [2]: s = 'This is a string' In [3]: type(s) Out[3]: str In [4]: x = 42 In [5]: type(x) Out[5]: int # Now let's create an integer

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

67

The type of an object matters for many expressions For example, the addition operator between two strings means concatenation
In [6]: '300' + 'cc' Out[6]: '300cc'

On the other hand, between two numbers it means ordinary addition


In [7]: 300 + 400 Out[7]: 700

Consider the following expression


In [8]: '300' + 400

Here we are mixing types, and its unclear to Python whether the user wants to convert '300' to an integer and then add it to 400, or convert 400 to string and then concatenate it with '300' Some languages might try to guess, by Python is strongly typed Type is important, and implicit type conversion is rare Python will respond instead by raising a TypeError
--------------------------------------------------------------------------TypeError Traceback (most recent call last) <ipython-input-9-9b7dffd27f2d> in <module>() ----> 1 '300' + 400 TypeError: cannot concatenate 'str' and 'int' objects

To avoid the error, you need to clarify by changing the relevant type For example,
In [9]: int('300') + 400 Out[9]: 700 # To add as numbers, change the string to an integer

Content The content of an object seems like an obvious concept For example, if we set x = 42 then it might seem that the content of x is just the number 42 But actually, theres more, as the following example shows
In [10]: x = 42 In [11]: x Out[11]: 42 In [12]: x.imag Out[12]: 0

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

68

In [13]: x.__class__ Out[13]: int

When Python creates this integer object, it stores with it various auxiliary information, such as the imaginary part, and the type As discussed previously, any name following a dot is called an attribute of the object to the left of the dot For example, imag and __class__ are attributes of x Identity In Python, each object has a unique identier, which helps Python (and us) keep track of the object The identity of an object can be obtained via the id() function
In [14]: y = 2.5 In [15]: z = 2.5 In [16]: id(y) Out[16]: 166719660 In [17]: id(z) Out[17]: 166719740

In this example, y and z happen to have the same value (i.e., 2.5), but they are not the same object The identity of an object is in fact just the address of the object in memory Methods As discussed earlier, methods are functions that are bundled with objects Formally, methods are attributes of objects that are callable (i.e., can be called as functions)
In [18]: x = ['foo', 'bar'] In [19]: callable(x.append) Out[19]: True In [20]: callable(x.__doc__) Out[20]: False

Methods typically act on the data contained in the object they belong to, or combine that data with other data
In [21]: x = ['a', 'b'] In [22]: x.append('c') In [23]: s = 'This is a string' In [24]: s.upper() Out[24]: 'THIS IS A STRING'

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

69

In [25]: s.lower() Out[25]: 'this is a string' In [26]: s.replace('This', 'That') Out[26]: 'That is a string'

A great deal of Python functionality is organized around method calls For example, consider the following piece of code
In [27]: x = ['a', 'b'] In [28]: x[0] = 'aa' In [29]: x Out[29]: ['aa', 'b'] # Item assignment using square bracket notation

It doesnt look like there are any methods used here, but in fact the square bracket assignment notation is just a convenient interface to a method call What actually happens is that Python calls the __setitem__ method, as follows
In [30]: x = ['a', 'b'] In [31]: x.__setitem__(0, 'aa') In [32]: x Out[32]: ['aa', 'b'] # Equivalent to x[0] = 'aa'

(If you wanted to you could modify the __setitem__ method, so that square bracket assignment does something totally different) Everything is an Object Above we said that in Python everything is an objectlets look at this again Consider, for example, functions When Python reads a function denition, it creates a function object and stores it in memory The following code illustrates
In [33]: def f(x): return x**2 In [34]: f Out[34]: <function __main__.f> In [35]: type(f) Out[35]: function In [36]: id(f) Out[36]: 3074342220L In [37]: f.func_name Out[37]: 'f'

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

70

We can see that f has type, identity, attributes and so onjust like any other object Likewise modules loaded into memory are treated as objects
In [38]: import math In [39]: id(math) Out[39]: 3074329380L

This uniform treatment of data in Python (everything is an object) helps keep the language simple and consistent

Iterables and Iterators


Weve already said something about iterating in Python Now lets look more closely at how it all works, focusing in Pythons implementation of the for loop Iterators Iterators are a uniform interface to stepping through elements in a collection Here well talk about using iteratorslater well learn how to build our own Formally, an iterator is an object with a next() method For example, le objects are iterators To see this, lets have another look at the US cities data
In [40]: f = open('us_cities.txt', 'r') In [41]: f.next() Out[41]: 'new york: 8244910\n' In [42]: f.next() Out[42]: 'los angeles: 3819702\n'

We see that le objects do indeed have a next method, and that calling this method returns the next line in the le The objects returned by enumerate() are also iterators
In [43]: e = enumerate(['foo', 'bar']) In [44]: e.next() Out[44]: (0, 'foo') In [45]: e.next() Out[45]: (1, 'bar')

as are the reader objects from the csv module

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

71

In [46]: from csv import reader In [47]: f = open('test_table.csv', 'r') In [48]: nikkei_data = reader(f) In [49]: nikkei_data.next() Out[49]: ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'] In [50]: nikkei_data.next() Out[50]: ['2008-05-19', '14294.52', '14343.19', '14219.08', '14269.61', '133800', '14269.61']

or objects returned by urllib.urlopen()


In [51]: import urllib In [52]: webpage = urllib.urlopen("http://www.cnn.com") In [53]: webpage.next() Out[53]: '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/...' # etc In [54]: webpage.next() Out[54]: '<meta http-equiv="refresh" content="1800;url=?refresh=1">\n'

In [55]: webpage.next() Out[55]: '<meta name="Description" content="CNN.com delivers the latest breaking news and information..'

Iterators in For Loops All iterators can be placed to the right of the in keyword in for loop statements In fact this is how the for loop works: If we write
for x in iterator: <code block>

then the interpreter calls iterator.next() and binds x to the result executes the code block repeats until a StopIteration error occurs So now you know how this magical looking syntax works
f = open('somefile.txt', 'r') for line in f: # do something

The interpreter just keeps 1. calling f.next() and binding line to the result 2. executing the body of the loop

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

72

This continues until a StopIteration error occurs Iterables You already know that we can put a Python list to the right of in in a for loop
for i in range(2): print 'foo'

So does that mean that a list is an iterator? The answer is no:


In [56]: type(x) Out[56]: list In [57]: x.next() --------------------------------------------------------------------------AttributeError Traceback (most recent call last) <ipython-input-54-e05f366da090> in <module>() ----> 1 x.next() AttributeError: 'list' object has no attribute 'next'

So why can we iterate over a list in a for loop? The reason is that a list is iterable (as opposed to an iterator) Formally, an object is iterable if it can be converted to an iterator using the built-in function iter() Lists are one such object
In [59]: x = ['foo', 'bar'] In [60]: type(x) Out[60]: list In [61]: y = iter(x) In [62]: type(y) Out[62]: listiterator In [63]: y.next() Out[63]: 'foo' In [64]: y.next() Out[64]: 'bar' In [65]: y.next() --------------------------------------------------------------------------StopIteration Traceback (most recent call last) <ipython-input-62-75a92ee8313a> in <module>() ----> 1 y.next() StopIteration:

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

73

Many other objects are iterable, such as dictionaries and tuples Of course, not all objects are iterable
In [66]: iter(42) --------------------------------------------------------------------------TypeError Traceback (most recent call last) <ipython-input-63-826bbd6e91fc> in <module>() ----> 1 iter(42) TypeError: 'int' object is not iterable

To conclude our discussion of for loops for loops work on either iterators or iterables In the second case, the iterable is converted into an iterator before the loop starts Iterators and built-ins Some built-in functions that act on sequences also work with iterables max(), min(), sum(), all(), any() For example
In [67]: x = [10, -10] In [68]: max(x) Out[68]: 10 In [69]: y = iter(x) In [70]: type(y) Out[70]: listiterator In [71]: max(y) Out[71]: 10

One thing to remember about iterators is that they are depleted by use
In [72]: x = [10, -10] In [73]: y = iter(x) In [74]: max(y) Out[74]: 10 In [75]: max(y) --------------------------------------------------------------------------ValueError Traceback (most recent call last) <ipython-input-72-1d3b6314f310> in <module>() ----> 1 max(y) ValueError: max() arg is an empty sequence

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

74

Names and Name Resolution


Variable Names in Python Consider the Python statement
In [76]: x = 42

We now know that when this statement is executed, Python creates an object of type int in your computers memory, containing the value 42 some associated attributes But what is x itself? In Python, x is called a name, and the statement
In [76]: x = 42

binds the name x to the integer object we have just discussed Under the hood, this process of binding names to objects is implemented as a dictionarymore about this in a moment There is no problem binding two or more names to the one object, regardless of what that object is
In [77]: def f(string): ....: print(string) In [78]: g = f In [79]: id(g) == id(f) Out[79]: True In [80]: g('test') Out[80]: test # Create a function called f # that prints any string it's passed

In the rst step, a function object is created, and the name f is bound to it After binding the name g to the same object, we can use it anywhere we would use f What happens when the number of names bound to an object goes to zero? Heres an example of this situation, where the name x is rst bound to one object and then rebound to another
In [81]: x = 42 In [82]: id(x) Out[82]: 164994764 In [83]: x = 'foo' # No names bound to object 164994764

What happens here is that the rst object, with identity 164994764 is garbage collected In other words, the memory slot that stores that object is deallocated, and return to the operating system T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

75

Namespaces Recall from the preceding discussion that the statement


In [84]: x = 42

binds the name x to the integer object on the right-hand side We also mentioned that this process of binding x to the correct object is implemented as a dictionary This dictionary is called a namespace Denition: A namespace is a symbol table that maps names to objects in memory Python uses multiple namespaces, creating them on the y as necessary For example, every time we import a module, Python creates a namespace for that module To see this in action, suppose we write a script math2.py like this
# Filename: math2.py pi = 'foobar'

Now we start the Python interpreter and import it


In [85]: import math2

Next lets import the math module from the standard library
In [86]: import math

Both of these modules have an attribute called pi


In [87]: math.pi Out[87]: 3.1415926535897931 In [88]: math2.pi Out[88]: 'foobar'

These two different bindings of pi exist in different namespaces, each one implemented as a dictionary We can look at the dictionary directly, using module_name.__dict__
In [89]: import math In [90]: math.__dict__ Out[90]: {'pow': <built-in function pow>, ..., 'pi': 3.1415926535897931,...} # Edited output In [91]: import math2 In [92]: math2.__dict__ Out[92]: {..., '__file__': 'math2.py', 'pi': 'foobar',...} # Edited output

As you know, we access elements of the namespace using the dotted attribute notation

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

76

In [93]: math.pi Out[93]: 3.1415926535897931

In fact this is entirely equivalent to math.__dict__['pi']


In [94]: math.__dict__['pi'] == math.pi Out[94]: True

Viewing Namespaces As we saw above, the math namespace can be printed by typing math.__dict__ Another way to see its contents is to type vars(math)
In [95]: vars(math) Out[95]: {'pow': <built-in function pow>,...

If you just want to see the names, you can type


In [96]: dir(math) Out[96]: ['__doc__', '__name__', 'acos', 'asin', 'atan',...

Notice the special names __doc__ and __name__ These are initialized in the namespace when any module is imported __doc__ is the doc string of the module __name__ is the name of the module
In [97]: print math.__doc__ This module is always available. It provides access to the mathematical functions defined by the C standard. In [98]: math.__name__ 'math'

Interactive Sessions In Python, all code executed by the interpreter runs in some module What about commands typed at the prompt? These are also regarded as being executed within a module in this case, a module called __main__ To check this, we can look at the current module name via the value of __name__ given at the prompt
In [99]: print(__name__) __main__

When we run a script using IPythons run command, the contents of the le are executed as part of __main__ too To see this, lets create a le mod.py that prints its own __name__ attribute

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

77

# Filename: mod.py print(__name__)

Now lets look at two different ways of running it in IPython


In [1]: import mod mod In [2]: run mod.py __main__ # Standard import # Run interactively

In the second case, the code is executed as part of __main__, so __name__ is equal to __main__ To see the contents of the namespace of __main__ we use vars() rather than vars(__main__) If you do this in IPython, you will see a whole lot of variables that IPython needs, and has initialized when you started up your session If you prefer to see only the variables you have initialized, use whos
In [3]: x = 2 In [4]: y = 3 In [5]: import numpy as np In [6]: whos Variable Type Data/Info -----------------------------np module <module 'numpy' from '/us<...>ages/numpy/__init__.pyc'> x int 2 y int 3

The Global Namespace Python documentation often makes reference to the global namespace The global namespace is the namespace of the module currently being executed For example, suppose that we start the interpreter and begin making assignments We are now working in the module __main__, and hence the namespace for __main__ is the global namespace Next, we import a module called amodule
In [7]: import amodule

At this point, the interpreter creates a namespace for the module amodule and starts executing commands in the module While this occurs, the namespace amodule.__dict__ is the global namespace Once execution of the module nishes, the interpreter returns to the module from where the import statement was made

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

78

In this case its __main__, so the namespace of __main__ again becomes the global namespace Local Namespaces When we call a function, the interpreter creates a local namespace for that function, and registers the variables in that namespace Variables in the namespace are called local variables After the function returns, the namespace is deallocated and lost While the function is executing, we can view the contents of the local namespace with locals() For example, consider
In [1]: def f(x): ...: a = 2 ...: print locals() ...: return a * x ...:

Now lets call the function


In [2]: f(1) {'a': 2, 'x': 1}

You can see the local namespace of f before it is destroyed The __builtins__ Namespace We have been using various built-in functions, such as max(), dir(), str(), list(), len(), range(), type(), etc. How does access to these names work? These denitions are stored in a module called __builtin__ They have there own namespace called __builtins__
In [12]: dir() Out[12]: [..., '__builtins__', '__doc__', ...] # Edited output In [13]: dir(__builtins__) Out[13]: [... 'iter', 'len', 'license', 'list', 'locals', ...] # Edited output

We can access elements of the namespace as follows


In [14]: __builtins__.max Out[14]: <built-in function max>

But __builtins__ is special, because we can always access them directly as well
In [15]: max Out[15]: <built-in function max> In [16]: __builtins__.max == max Out[16]: True

The next section explains how this works ... T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

79

Name resolution When we reference a name, how does the Python interpreter nd the corresponding value? At any point of execution, there are either two or three namespaces that can be accessed directly (Directly means without using a dot, as in pi rather than math.pi) If the interpreter is not executing a function call, then these namespaces are The global namespace (of the module being executed) The builtin namespace If we refer to a name such as x, the interpreter First looks in the global namespace for x If its not there, then it looks in the built-in namespace If its not there, it raises a NameError If the interpreter is executing a function, then the namespaces are The local namespace of the function The global namespace (of the module being executed) The builtin namespace Now the interpreter First looks in the local namespace Then in the global namespace Then in the builtin namespace If its not there, it raises a NameError To illustrate this further, consider a script test.py that looks as follows
def g(x): a = 1 x = x + a return x a = 0 y = g(10) print "a = ", a, "y = ", y

What happens when we run this script?


In [17]: run test.py a = 0 y = 11 In [18]: x --------------------------------------------------------------------------NameError Traceback (most recent call last) <ipython-input-2-401b30e3b8b5> in <module>()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.6. HOW IT WORKS: DATA, VARIABLES AND NAMES

80

----> 1 x NameError: name 'x' is not defined

First, The global namespace {} is created The function object is created, and g is bound to it within the global namespace The name a is bound to 0, again in the global namespace Next g is called via y = g(10), leading to the following sequence of actions The local namespace for the function is created Local names x and a are bound, so that the local namespace becomes {'x':

10, 'a': 1}

Statement x = x + a uses the local a and local x to compute x + a, and binds local name x to the result This value is returned, and y is bound to it in the global namespace Local x and a are discarded (and the local namespace is deallocated) Note that the global a was not affected by the local a Mutable Versus Immutable Parameters This is a good time to say a little more about mutable vs immutable objects Consider the code segment
def f(x): x = x + 1 return x x = 1 print f(x), x

We now understand what will happen here: The code prints 2 as the value of f(x) and 1 as the value of x First f and x are registered in the global namespace The call f(x) creates a local namespace and adds x to it, bound to 1 Next, this local x is rebound to the new integer object 2, and this value is returned None of this affects the global x However, its a different story when we use a mutable data type such as a list
def f(x): x[0] = x[0] + 1 return x

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.7. MORE LANGUAGE FEATURES

81

x = [1] print f(x), x

This prints [2] as the value of f(x) and same for x Heres what happens f is registered as a function in the global namespace x bound to [1] in the global namespace The call f(x) Creates a local namespace Adds x to local namespace, bound to [1] The list [1] is modied to [2] Returns the list [2] The local namespace is deallocated, and local x is lost Global x has been modied

2.7 More Language Features


Overview of This Lecture
As with the last lecture, our advice is to skip this lecture on rst pass, unless you have a burning desire to read it Its here 1. as a reference, so we can link back to it when required, and 2. for those who have worked through a number of applications, and now want to learn more about the Python language A variety of topics are treated in the lecture, including generators, exceptions and descriptors

Handling Errors
Sometimes its possible to anticipate errors as were writing code For example, the unbiased sample variance of sample y1 , . . . , yn is dened as s2 : =
n 1 )2 ( yi y n 1 i =1

= sample mean y

This can be calculated in NumPy using np.var But if you were writing a function to handle such a calculation, you might anticipate a divide-byzero error when the sample size is one T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.7. MORE LANGUAGE FEATURES

82

One possible action is to do nothing the program will just crash, and spit out an error message But usually its worth writing your code in a way that anticipates and deals with runtime errors that you think might arise Why? Because the debugging information provided by the interpreter is often less useful than the information on possible errors you have in your head when writing code Because errors causing execution to stop are frustrating if youre in the middle of a large computation Because its reduces condence in your code on the part of your users Hence its usually best to add code to your program that deals with errors as they occur Assertions One of the easiest ways to handle these kinds of problems is with the assert keyword For example, pretend for a moment that the np.var function doesnt exist and we need to write our own
In [19]: def var(y): ....: n = len(y) ....: assert n > 1, 'Sample size must be greater than one.' ....: return np.sum((y - y.mean())**2) / float(n-1) ....:

If we run this with an array of length one, the program will terminate and print our error message
In [20]: var([1]) --------------------------------------------------------------------------AssertionError Traceback (most recent call last) <ipython-input-20-0032ff8a150f> in <module>() ----> 1 var([1]) <ipython-input-19-cefafaec3555> in var(y) 1 def var(y): 2 n = len(y) ----> 3 assert n > 1, 'Sample size must be greater than one.' 4 return np.sum((y - y.mean())**2) / float(n-1) AssertionError: Sample size must be greater than one.

The advantage is that we can fail early, as soon as we know there will be a problem supply specic information on why a program is failing Handling Errors During Runtime The approach used above is a bit limited, because it always leads to termination Sometimes we can handle errors more gracefully, by treating special cases T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.7. MORE LANGUAGE FEATURES

83

Lets look at how this is done Exceptions Heres an example of a common error type
In [43]: def f: File "<ipython-input-5-f5bdb6d29788>", line 1 def f: ^ SyntaxError: invalid syntax

Since illegal syntax cannot be executed, a syntax error terminates execution of the program Heres a different kind of error, unrelated to syntax
In [44]: 1 / 0 --------------------------------------------------------------------------ZeroDivisionError Traceback (most recent call last) <ipython-input-17-05c9758a9c21> in <module>() ----> 1 1/0 ZeroDivisionError: integer division or modulo by zero

Heres another
In [45]: x1 = y1 --------------------------------------------------------------------------NameError Traceback (most recent call last) <ipython-input-23-142e0509fbd6> in <module>() ----> 1 x1 = y1 NameError: name 'y1' is not defined

And another
In [46]: 'foo' + 6 --------------------------------------------------------------------------TypeError Traceback (most recent call last) <ipython-input-20-44bbe7e963e7> in <module>() ----> 1 'foo' + 6 TypeError: cannot concatenate 'str' and 'int' objects

And another
In [47]: X = [] In [48]: x = X[0] --------------------------------------------------------------------------IndexError Traceback (most recent call last) <ipython-input-22-018da6d9fc14> in <module>() ----> 1 x = X[0] IndexError: list index out of range

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.7. MORE LANGUAGE FEATURES

84

On each occaision, the interpreter informs us of the error type NameError, TypeError, IndexError, ZeroDivisionError, etc. In Python, these errors are called exceptions Catching Exceptions We can catch and deal with exceptions using try except blocks Heres a simple example
def f(x): try: return 1.0 / x except ZeroDivisionError: print 'Error: division by zero. return None

Returned None'

When we call f we get the following output


In [50]: f(2) Out[50]: 0.5 In [51]: f(0) Error: division by zero. Returned None In [52]: f(0.0) Error: division by zero. Returned None

The error is caught and execution of the program is not terminated Note that other error types are not caught If we are worried the user might pass in a string, we can catch that error too
def f(x): try: return 1.0 / x except ZeroDivisionError: print 'Error: Division by zero. Returned None' except TypeError: print 'Error: Unsupported operation. Returned None' return None

Heres what happens


In [54]: f(2) Out[54]: 0.5 In [55]: f(0) Error: Division by zero. Returned None In [56]: f('foo') Error: Unsupported operation. Returned None

If we feel lazy we can catch these errors together T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.7. MORE LANGUAGE FEATURES

85

def f(x): try: return 1.0 / x except (TypeError, ZeroDivisionError): print 'Error: Unsupported operation. return None

Returned None'

Heres what happens


In [58]: f(2) Out[58]: 0.5 In [59]: f(0) Error: Unsupported operation. In [60]: f('foo') Error: Unsupported operation. Returned None Returned None

If we feel extra lazy we can catch all error types as follows


def f(x): try: return 1.0 / x except: print 'Error. Returned None' return None

In general its better to be specic

Generators
A generator is a kind of iterator (i.e., it implements a next() method) We will study two ways to build generators: generator expressions and generator functions Generator Expressions The easiest way to build generators is using generator expressions Just like a list comprehension, but with round brackets Here is the list comprehension:
In [1]: singular = ('dog', 'cat', 'bird') In [2]: type(singular) Out[2]: tuple In [3]: plural = [string + 's' for string in singular] In [4]: plural Out[4]: ['dogs', 'cats', 'birds'] In [5]: type(plural) Out[5]: list

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.7. MORE LANGUAGE FEATURES

86

And here is the generator expression


In [6]: singular = ('dog', 'cat', 'bird') In [7]: plural = (string + 's' for string in singular) In [8]: type(plural) Out[8]: generator In [9]: plural.next() Out[9]: 'dogs' In [10]: plural.next() Out[10]: 'cats' In [11]: plural.next() Out[11]: 'birds'

Since sum() can be called on iterators, we can do this


In [12]: sum((x * x for x in range(10))) Out[12]: 285

The function sum() calls next() to get the items, adds successive terms In fact, we can omit the outer brackets in this case
In [13]: sum(x * x for x in range(10)) Out[13]: 285

Generator Functions The most exible way to create generator objects is to use generator functions Lets look at some examples Example 1 Heres a very simple example of a generator function
def f(): yield 'start' yield 'middle' yield 'end'

It looks like a function, but uses a keyword yield that we havent met before Lets see how it works after running this code
In [15]: type(f) Out[15]: function In [16]: gen = f()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.7. MORE LANGUAGE FEATURES

87

In [17]: gen Out[17]: <generator object f at 0x3b66a50> In [18]: gen.next() Out[18]: 'start' In [19]: gen.next() Out[19]: 'middle' In [20]: gen.next() Out[20]: 'end' In [21]: gen.next() --------------------------------------------------------------------------StopIteration Traceback (most recent call last) <ipython-input-21-b2c61ce5e131> in <module>() ----> 1 gen.next() StopIteration:

The generator function f() is used to create generator objects (in this case gen) Generators are iterators, because they support a next() method The rst call to gen.next() Executes code in the body of f() until it meets a yield statement Returns that value to the caller of gen.next() The second call to gen.next() starts executing from the next line
def f(): yield 'start' yield 'middle' # This line! yield 'end'

and continues until the next yield statement At that point it returns the value following yield to the caller of gen.next(), and so on When the code block ends, the generator throws a StopIteration error Example 2 Our next example receives an argument x from the caller
def g(x): while x < 100: yield x x = x * x

Lets see how it works


In [24]: g Out[24]: <function __main__.g>

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.7. MORE LANGUAGE FEATURES

88

In [25]: gen = g(2) In [26]: type(gen) Out[26]: generator In [27]: gen.next() Out[27]: 2 In [28]: gen.next() Out[28]: 4 In [29]: gen.next() Out[29]: 16 In [30]: gen.next() --------------------------------------------------------------------------StopIteration Traceback (most recent call last) <ipython-input-32-b2c61ce5e131> in <module>() ----> 1 gen.next() StopIteration:

The call gen = g(2) binds gen to a generator Inside the generator, the name x is bound to 2 When we call gen.next() The body of g() executes until the line yield x, and the value of x is returned Note that value of x is retained inside the generator When we call gen.next() again, execution continues from where it left off
def g(x): while x < 100: yield x x = x * x # execution continues from here

When x < 100 fails, the generator throws a StopIteration error Incidentally, the loop inside the generator can be innite
def g(x): while 1: yield x x = x * x

Advantages of Iterators Whats the advantage of using an iterator here? Suppose we want to sample a binomial(n,0.5) One way to do it is as follows

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.7. MORE LANGUAGE FEATURES

89

In [32]: n = 10000000 In [33]: draws = [random.uniform(0, 1) < 0.5 for i in range(n)] In [34]: sum(draws)

But we are creating two huge lists here, range(n) and draws This uses lots of memory and is very slow If we make n even bigger then this happens
In [35]: n = 1000000000 In [36]: draws = [random.uniform(0, 1) < 0.5 for i in range(n)] MemoryError Traceback (most recent call last) <ipython-input-9-20d1ec1dae24> in <module>() ----> 1 draws = [random.uniform(0, 1) < 0.5 for i in range(n)]

We can avoid these problems using iterators Here is the generator function
import random def f(n): i = 1 while i <= n: yield random.uniform(0, 1) < 0.5 i += 1

Now lets do the sum


In [39]: n = 10000000 In [40]: draws = f(n) In [41]: draws Out[41]: <generator object at 0xb7d8b2cc> In [42]: sum(draws) Out[42]: 4999141

In summary, iterables avoid the need to create big lists/tuples, and provide a uniform interface to iteration that can be used transparently in for loops

Descriptors
Descriptors solve a common problem regarding management of variables To understand the issue, consider a Car class, performing a variety of tasks that we wont bother to describe T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

2.7. MORE LANGUAGE FEATURES

90

Suppose that this class denes the variables miles_till_service and kms_till_service, which give the distance until next service in miles and kilometers respectively A highly simplied version of the class might look as follows
class Car(object): def __init__(self, miles_till_service=1000): self.miles_till_service = miles_till_service self.kms_till_service = miles_till_service * 1.61

One potential problem we might have here is that a user alters one of these variables but not the other
In [2]: car = Car() In [3]: car.miles_till_service Out[3]: 1000 In [4]: car.kms_till_service Out[4]: 1610.0 In [5]: car.miles_till_service = 6000 In [6]: car.kms_till_service Out[6]: 1610.0

In the last two lines we see that miles_till_service and kms_till_service are out of sync What we really want is some mechanism whereby each time a user sets one of these variables, the other is automatically updated In Python, this is solved using descriptors, an implementation of which could look as follows
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: descriptor_eg.py Authors: John Stachurski, Thomas J. Sargent LastModified: 11/08/2013 """ class Car(object): def __init__(self, miles_till_service=1000): self.__miles_till_service = miles_till_service self.__kms_till_service = miles_till_service * 1.61 def set_miles(self, value): self.__miles_till_service = value self.__kms_till_service = value * 1.61 def set_kms(self, value): self.__kms_till_service = value self.__miles_till_service = value / 1.61

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.7. MORE LANGUAGE FEATURES

91

def get_miles(self): return self.__miles_till_service def get_kms(self): return self.__kms_till_service miles_till_service = property(get_miles, set_miles) kms_till_service = property(get_kms, set_kms)

The names __miles_till_service and __kms_till_service are arbitrary names we are using to store the values of the variables The objects miles_till_service and kms_till_service are properties, use of which invokes the various get and set methods In any case, we now get the desired behaviour
In [8]: car = Car() In [9]: car.miles_till_service Out[9]: 1000 In [10]: car.miles_till_service = 6000 In [11]: car.kms_till_service Out[11]: 9660.0

For further information you can refer to the documentation

Recursive Function Calls


This is not something that you will use every day, but it is still useful you should learn it at some stage Basically, a recursive function is a function that calls itself For example, consider the problem of computing xt for some t when x t +1 = 2 x t , Obviously the answer is 2t We can compute this easily enough with a loop
def x_loop(t): x = 1 for i in range(t): x = 2 * x return x

x0 = 1

(2.5)

We can also use a recursive solution, as follows

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.7. MORE LANGUAGE FEATURES

92

def x(t): if t == 0: return 1 else: return 2 * x(t-1)

What happens here is that each successive call uses its own frame in the stack a frame is where the local variables of a given function call are held stack is memory used to process function calls a First In Last Out (FILO) queue This example is somewhat contrived, since the rst (iterative) solution would usually be preferred to the recursive solution Well meet less contrived applications of recursion later on

Exercises
Exercise 1 The Fibonacci numbers are dened by x t +1 = x t + x t 1 , x0 = 0, x1 = 1 (2.6)

The rst few numbers in the sequence are: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55 Write a function to recursively compute the t-th Fibonacci number for any t Solution: View solution Exercise 2 Complete the following code, and test it using the test_table.csv le in the main repository
def column_iterator(target_file, column_number): """A generator function for CSV files. When called with a file name target_file (string) and column number column_number (integer), the generator function returns a generator that steps through the elements of column column_number in file target_file. """ # put your code here dates = column_iterator('test_table.csv', 1) for date in dates: print date

Solution: View solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.7. MORE LANGUAGE FEATURES

93

Exercise 3 Suppose we have a text le numbers.txt containing the following lines


prices 3 8 7 21

Using try except, write a program to read in the contents of the le and sum the numbers, ignoring lines without numbers Solution: View solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

2.7. MORE LANGUAGE FEATURES

94

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

CHAPTER

THREE

THE SCIENTIFIC LIBRARIES


The second part of the course covers Pythons most important scientic libraries

3.1 NumPy
Lets be clear: the work of science has nothing whatever to do with consensus. Consensus is the business of politics. Science, on the contrary, requires only one investigator who happens to be right, which means that he or she has results that are veriable by reference to the real world. In science consensus is irrelevant. What is relevant is reproducible results. Michael Crichton

Overview of This Lecture


NumPy is a rst-rate library for numerical programming Widely used in academia, nance and industry Mature, fast and and stable In this lecture we introduce the NumPy array data type and fundamental array processing operations We assume that NumPy is installed on the machine you are usingsee this page for instructions

Introduction to NumPy
The essential problem that NumPy solves is fast array processing This is necessary primarily because iteration via loops in interpreted languages (Python, MATLAB, Ruby, etc.) is relative slow Loops in compiled languages like C and Fortran can be orders of magnitude faster Why? Because interpreted languages convert commands to machine code and execute them one by onea difcult process to optimize Does that mean that we should just switch to C or Fortran for everything?

95

3.1. NUMPY

96

The answer is a denite no high productivity languages should be chosen over high speed languages for the majority of tasks (see this discussion) But it does mean that we need libaries like NumPy, through which operations can be sent in batches to optimized C and Fortran code Lets begin by considering NumPy arrays, which power almost all of the scientic Python ecosystem

NumPy Arrays
The most important thing that NumPy denes is an array data type formally called a numpy.ndarray For example, the np.zeros function returns an numpy.ndarray of zeros
In [1]: import numpy as np In [2]: a = np.zeros(3) In [3]: a Out[3]: array([ 0., 0., In [4]: type(a) Out[4]: numpy.ndarray 0.])

NumPy arrays are somewhat like native Python lists, except that Data must be homogeneous (all elements of the same type) These types must be one of the data types (dtypes) provided by NumPy The most important of these dtypes are: oat64: 64 bit oating point number oat32: 32 bit oating point number int64: 64 bit integer int32: 32 bit integer bool: 8 bit True or False There are also dtypes to represent complex numbers, unsigned integers, etc On most machines, the default dtype for arrays is float64
In [7]: a = np.zeros(3) In [8]: type(a[0]) Out[8]: numpy.float64

If we want to use integers we can specify as follows:

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.1. NUMPY

97

In [9]: a = np.zeros(3, dtype=int) In [10]: type(a[0]) Out[10]: numpy.int32

Shape and Dimension When we create an array such as


In [11]: z = np.zeros(10)

z is a at array with no dimension neither row nor column vector


The dimension is recorded in the shape attribute, which is a tuple
In [12]: z.shape Out[12]: (10,) # Note syntax for tuple with one element

Here the shape tuple has only one element, which is the length of the array (tuples with one element end with a comma) To give it dimension, we can change the shape attribute
In [13]: z.shape = (10, 1) In [14]: Out[14]: array([[ [ [ [ [ [ [ [ [ [ z 0.], 0.], 0.], 0.], 0.], 0.], 0.], 0.], 0.], 0.]])

In [15]: z = np.zeros(4) In [16]: z.shape = (2, 2) In [17]: z Out[17]: array([[ 0., 0.], [ 0., 0.]])

In the last case, to make the 2 by 2 array, we could also pass a tuple to the zeros() function, as in z = np.zeros((2, 2)) Creating Arrays As weve seen, the np.zeros function creates an array of zeros You can probably guess what np.ones creates Related is np.empty, which creates arrays in memory that can later be populated with data T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.1. NUMPY

98

In [18]: z = np.empty(3) In [19]: z Out[19]: array([ 8.90030222e-307, 4.94944794e+173, 4.04144187e-262])

The numbers you see here are garbage values (Python allocates 3 contiguous 46 bit pieces of memory, and the existing contents of those memory slots are interpreted as float64 values) To set up a grid of evenly spaced numbers use np.linspace
In [20]: z = np.linspace(2, 4, 5) # From 2 to 4, with 5 elements

To create an identity matrix use either np.identity or np.eye


In [21]: z = np.identity(2) In [22]: z Out[22]: array([[ 1., 0.], [ 0., 1.]])

In addition, NumPy arrays can be created from Python lists, tuples, etc. using np.array
In [23]: z = np.array([10, 20]) In [24]: z Out[24]: array([10, 20]) In [25]: type(z) Out[25]: numpy.ndarray In [26]: z = np.array((10, 20), dtype=float) In [27]: z Out[27]: array([ 10., 20.]) In [28]: z = np.array([[1, 2], [3, 4]]) In [29]: z Out[29]: array([[1, 2], [3, 4]]) # 2D array from a list of lists # Here 'float' is equivalent to 'np.float64' # ndarray from Python list

See also np.asarray, which performs a similar function, but does not make a distinct copy of data already in a NumPy array
In [11]: na = np.linspace(10, 20, 2) In [12]: na is np.asarray(na) Out[12]: True # Does not copy NumPy arrays

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.1. NUMPY

99
# Does make a new copy --- perhaps unnecessarily

In [13]: na is np.array(na) Out[13]: False

To read in the array data from a text le containing numeric data use np.loadtxt or np.genfromtxtsee the documentation for details Array Indexing For a at array, indexing is the same as Python sequences:
In [30]: z = np.linspace(1, 2, 5) In [31]: z Out[31]: array([ 1. , In [32]: z[0] Out[32]: 1.0 In [33]: z[0:2] # Slice numbering is left closed, right open Out[33]: array([ 1. , 1.25]) In [34]: z[-1] Out[34]: 2.0 1.25, 1.5 , 1.75, 2. ])

For 2D arrays the syntax is as follows:


In [35]: z = np.array([[1, 2], [3, 4]]) In [36]: z Out[36]: array([[1, 2], [3, 4]]) In [37]: z[0, 0] Out[37]: 1 In [38]: z[0, 1] Out[38]: 2

And so on Note that indices are still zero-based, to maintain compatibility with Python sequences Columns and rows can be extracted as follows
In [39]: z[0,:] Out[39]: array([1, 2]) In [40]: z[:,1] Out[40]: array([2, 4])

NumPy arrays of integers can also be used to extract elements


In [41]: z = np.linspace(2, 4, 5)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.1. NUMPY

100

In [42]: z Out[42]: array([ 2. ,

2.5,

3. ,

3.5,

4. ])

In [43]: indices = np.array((0, 2, 3)) In [44]: z[indices] Out[44]: array([ 2. , 3. , 3.5])

Finally, an array of dtype bool can be used to extract elements


In [45]: z Out[45]: array([ 2. , 2.5, 3. , 3.5, 4. ])

In [46]: d = np.array([0, 1, 1, 0, 0], dtype=bool) In [47]: d Out[47]: array([False, True, True, False, False], dtype=bool)

In [48]: z[d] Out[48]: array([ 2.5, 3. ])

Well see why this is useful below An aside: all elements of an array can be set equal to one number using slice notation
In [49]: z = np.empty(3) In [50]: z Out[50]: array([ -1.25236750e-041, In [51]: z[:] = 42 In [52]: z Out[52]: array([ 42., 42., 42.]) 0.00000000e+000, 5.45693855e-313])

Array Methods Arrays have useful methods, all of which are highly optimized
In [53]: A = np.array((4, 3, 2, 1)) In [54]: A Out[54]: array([4, 3, 2, 1]) In [55]: A.sort() In [56]: A Out[56]: array([1, 2, 3, 4]) In [57]: A.sum() Out[57]: 10 In [58]: A.mean() Out[58]: 2.5 # Sum # Mean # Sorts A in place

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.1. NUMPY

101
# Max # Returns the index of the maximal element # Cumulative sum of the elements of A 6, 10]) # Cumulative product of the elements of A 6, 24]) # Variance # Standard deviation

In [59]: A.max() Out[59]: 4 In [60]: A.argmax() Out[60]: 3 In [61]: A.cumsum() Out[61]: array([ 1, 3, In [62]: A.cumprod() Out[62]: array([ 1, 2, In [63]: A.var() Out[63]: 1.25 In [64]: A.std() Out[64]: 1.1180339887498949 In [65]: A.shape = (2, 2) In [66]: A.T Out[66]: array([[1, 3], [2, 4]])

# Equivalent to A.transpose()

Another method worth knowing is searchsorted() If z is a nondecreasing array, then z.searchsorted(a) returns index of rst z in z such that z >= a
In [67]: z = np.linspace(2, 4, 5) In [68]: z Out[68]: array([ 2. , 2.5, 3. , 3.5, 4. ])

In [69]: z.searchsorted(2.2) Out[69]: 1 In [70]: z.searchsorted(2.5) Out[70]: 1 In [71]: z.searchsorted(2.6) Out[71]: 2

Many of the methods discussed above have equivalent functions in the NumPy namespace
In [72]: a = np.array((4, 3, 2, 1)) In [73]: np.sum(a) Out[73]: 10 In [74]: np.mean(a) Out[74]: 2.5

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.1. NUMPY

102

Operations on Arrays
Algebraic Operations The algebraic operators +, -, *, / and ** all act elementwise on arrays
In [75]: a = np.array([1, 2, 3, 4]) In [76]: b = np.array([5, 6, 7, 8]) In [77]: a + b Out[77]: array([ 6, 8, 10, 12]) In [78]: a * b Out[78]: array([ 5, 12, 21, 32])

We can add a scalar to each element as follows


In [79]: a + 10 Out[79]: array([11, 12, 13, 14])

Scalar multiplication is similar


In [81]: a = np.array([1, 2, 3, 4]) In [82]: a * 10 Out[82]: array([10, 20, 30, 40])

The two dimensional arrays follow the same general rules


In [86]: A = np.ones((2, 2)) In [87]: B = np.ones((2, 2)) In [88]: A + B Out[88]: array([[ 2., 2.], [ 2., 2.]]) In [89]: A + 10 Out[89]: array([[ 11., 11.], [ 11., 11.]]) In [90]: A * B Out[90]: array([[ 1., 1.], [ 1., 1.]])

Matrix Multiplication In particular, A * B is not the matrix product, it is an elementwise product To do matrix multiplication we can either convert the arrays into numpy.matrix data type or use the np.dot function T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.1. NUMPY

103

The rst case is discussed below, for now lets look at dot
In [137]: A = np.ones((2, 2)) In [138]: B = np.ones((2, 2)) In [139]: np.dot(A, B) Out[139]: array([[ 2., 2.], [ 2., 2.]])

With np.dot we can also take the inner product of two at arrays
In [91]: A = np.array([1, 2]) In [92]: B = np.array([10, 20]) In [93]: np.dot(A, B) Out[93]: 50 # Returns a scalar in this case

In fact we can use dot when one element is a Python list or tuple
In [94]: A = np.empty((2, 2)) In [95]: A Out[95]: array([[ 3.48091887e-262, 1.14802984e-263], [ 3.61513512e-313, -1.25232371e-041]]) In [96]: np.dot(A, (0, 1)) Out[96]: array([ 1.14802984e-263, -1.25232371e-041])

Here dot knows we are postmultiplying, so (0, 1) is treated as a column vector Comparisons As a rule, comparisons on arrays are done elementwise
In [97]: z = np.array([2, 3]) In [98]: y = np.array([2, 3]) In [99]: z == y Out[99]: array([ True, In [100]: y[0] = 5 In [101]: z == y Out[101]: array([False, True], dtype=bool) True], dtype=bool)

In [102]: z != y Out[102]: array([ True, False], dtype=bool)

The situation is similar for >, <, >= and <= We can also do comparisons against scalars T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.1. NUMPY

104

In [103]: z = np.linspace(0, 10, 5) In [104]: z Out[104]: array([ 0. , 2.5, 5. , True, 7.5, 10. ]) True, True], dtype=bool)

In [105]: z > 3 Out[105]: array([False, False,

This is particularly useful for conditional extraction


In [106]: b = z > 3 In [107]: b Out[107]: array([False, False, In [108]: z[b] Out[108]: array([ 5. , 7.5, True, 10. ]) True, True], dtype=bool)

Of course we canand frequently doperform this in one step


In [109]: z[z > 3] Out[109]: array([ 5. , 7.5, 10. ])

Vectorized Functions NumPy provides versions of the standard functions log, exp, sin, etc. that act elementwise on arrays
In [110]: z = np.array([1, 2, 3]) In [111]: np.sin(z) Out[111]: array([ 0.84147098, 0.90929743, 0.14112001])

This eliminates the need for explicit element-by-element loops such as


for i in range(n): y[i] = np.sin(z[i])

Because they act elementwise on arrays, these functions are called vectorized functions In NumPy-speak, they are also called ufuncs, which stands for universal functions As we saw above, the usual arithmetic operations (+, *, etc.) also work elementwise, and combining these with the ufuncs gives a very large set of fast elementwise functions
In [112]: z Out[112]: array([1, 2, 3]) In [113]: (1 / np.sqrt(2 * np.pi)) * np.exp(- 0.5 * z**2) Out[113]: array([ 0.24197072, 0.05399097, 0.00443185])

Not all user dened functions will act elementwise For example, passing this function a NumPy array causes a ValueError

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.1. NUMPY

105

def f(x): return 1 if x > 0 else 0

In this situation you should use the vectorized NumPy function np.where
In [114]: import numpy as np In [115]: x = np.random.randn(4) In [116]: x Out[116]: array([-0.25521782, In [117]: np.where(x > 0, 1, 0) Out[117]: array([0, 1, 0, 0]) 0.38285891, -0.98037787, -0.083662 ])

Although its usually better to hand code vectorized functions from vectorized NumPy operations, at a pinch you can use np.vectorize
In [118]: def f(x): return 1 if x > 0 else 0 In [119]: f = np.vectorize(f) In [120]: f(x) # Passing same vector x as previous example Out[120]: array([0, 1, 0, 0])

NumPy Matrices Because np.dot can be inconvenient for expressions involving the multiplication of many matricies, NumPy provides the numpy.matrix class For instances of this data type, the * operator means matrix (as opposed to elementwise) multiplication NumPy arrays can be converted to the numpy.matrix class using the np.matrix function
In [122]: A = np.ones(4) In [123]: b = np.array([0, 1]) In [124]: A.shape = (2, 2) In [125]: b.shape = (2, 1) In [126]: A = np.matrix(A) In [127]: b = np.matrix(b) In [128]: A * b Out[128]: matrix([[ 1.], [ 1.]]) # Matrix multiplication # Convert from array to matrix

However, its easy to get mixed up between NumPy arrays and NumPy matrices For this reason, the numpy.matrix type is avoided by many programmers, including us T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.1. NUMPY

106

Other NumPy Functions


NumPy provides some additional functionality related to scientic programming For example
In [131]: A = np.array([[1, 2], [3, 4]]) In [132]: np.linalg.det(A) Out[132]: -2.0000000000000004 In [133]: np.linalg.inv(A) Out[133]: array([[-2. , 1. ], [ 1.5, -0.5]]) In [134]: Z = np.random.randn(10000) # Compute the determinant # Compute the inverse

# Generate standard normals # 1,000 draws from Bin(10, 0.5)

In [135]: y = np.random.binomial(10, 0.5, size=1000) In [136]: y.mean() Out[136]: 5.0369999999999999

However, all of this functionality is also available in SciPy, a collection of modules that build on top of NumPy Well cover the SciPy versions in more detail soon

Exercises
Exercise 1 Consider the polynomial expression p ( x ) = a0 + a1 x + a2 x 2 + a N x N =

n =0

an x n

(3.1)

Ealier, you wrote a simple function p(x, coeff) to evaluate (3.1) without considering efciency Now write a new function that does the same job, but uses NumPy arrays and array operations for its computations, rather than any form of Python loop (Such functionality is already implemented as np.poly1d, but for the sake of the exercise dont use this class) Hint: Use np.cumprod() Solution: View solution Exercise 2 Let q be a NumPy array of length n with q.sum() == 1 Suppose that q represents a probability mass function We wish to generate a discrete random variable x such that P{ x = i } = qi

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.1. NUMPY

107

In other words, x takes values in range(len(q)) and x = i with probability q[i] The standard (inverse transform) algorithm is as follows: Divide the unit interval [0, 1] into n subintervals I0 , I1 , . . . , In1 such that the length of Ii is qi Draw a uniform random variable U on [0, 1] and return the i such that U Ii The probability of drawing i is the length of Ii , which is equal to qi We can implement the algorithm as follows
from random import uniform def sample(q): a = 0.0 U = uniform(0, 1) for i in range(len(q)): if a < U <= a + q[i]: return i a = a + q[i]

If you cant see how this works, try thinking through the ow for a simple example, such as q = [0.25, 0.75] It helps to sketch the intervals on paper Your exercise is to speed it up using NumPy, avoiding explicit loops Hint: Use np.searchsorted and np.cumsum If you can, implement the functionality as a class called discreteRV, where the data for an instance of the class is the vector of probabilities q the class has a draw() method, which returns one draw according to the algorithm described above If you can, write the method so that draw(k) returns k draws from q Solution: View solution Exercise 3 Recall our earlier discussion of the empirical distribution function We came up with the implementation
class ecdf: def __init__(self, observations): self.observations = observations def __call__(self, x): counter = 0.0 for obs in self.observations: if obs <= x: counter += 1 return counter / len(self.observations)

Your task is to T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.2. SCIPY

108

1. Make the __call__ method more efcient using NumPy 2. Add a method that plots the ECDF over [ a, b], where a and b are method parameters Solution: View solution

3.2 SciPy
SciPy builds on top of NumPy to provide common tools for scientic programming, such as linear algebra numerical integration interpolation optimization distributions and random number generation signal processing etc., etc Like NumPy, SciPy is stable, mature and widely used Many SciPy routines are thin wrappers around industry-standard Fortran libraries such as LAPACK, BLAS, etc. Its not really necessary to learn SciPy as a wholea better approach is to learn each relevant feature as required You can browse from the top of the documentation tree to see whats available In this lecture we aim only to highlight some useful parts of the package

SciPy versus NumPy


SciPy is a package that contains various tools that are built on top of NumPy, using its array data type and related functionality In fact, when we import SciPy we also get NumPy, as can be seen from the SciPy initialization le
# Import numpy symbols to scipy name space from numpy import * from numpy.random import rand, randn from numpy.fft import fft, ifft from numpy.lib.scimath import * # Remove the linalg imported from numpy so that the scipy.linalg package can be # imported. del linalg

The majority of SciPys functionality resides in its subpackages scipy.optimize, scipy.integrate, scipy.stats, etc. T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.2. SCIPY

109

We will review the major subpackages below Note that these subpackages need to be imported separately
import scipy.optimize from scipy.integrate import quad

Although SciPy imports NumPy, the standard approach is to start scientic programs with
import numpy as np

and then import bits and pieces from SciPy as needed


from scipy.integrate import quad from scipy.optimize import brentq # etc

This approach helps clarify what functionality belongs to what package, and we will follow it in these lectures

Statistics
The scipy.stats subpackage supplies numerous random variable objects (densities, cumulative distributions, random sampling, etc.) some estimation procedures some statistical tests Random Variables and Distributions Recall that numpy.random provides functions for generating random variables
In [1]: import numpy as np In [2]: np.random.beta(5, 5, size=3) Out[2]: array([ 0.6167565 , 0.67994589, 0.32346476])

This generates a draw from the distribution below when a, b = 5, 5 f ( x ; a, b ) = x ( a 1) (1 x ) ( b 1)


1 0

u(a1) u(b1) du

(0 x 1)

(3.2)

Sometimes we need access to the density itself, or the cdf, the quantiles, etc. For this we can use scipy.stats, which provides all of this functionality as well as random number generation in a single consistent interface Heres an example of usage

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.2. SCIPY

110

import numpy as np from scipy.stats import beta from matplotlib.pyplot import hist, plot, show q = beta(5, 5) # Beta(a, b), with a = b = 5 obs = q.rvs(2000) # 2000 observations hist(obs, bins=40, normed=True) grid = np.linspace(0.01, 0.99, 100) plot(grid, q.pdf(grid), 'k-', linewidth=2) show()

The following plot is produced

In this code we created a so-called rv_frozen object, via the call q = beta(5, 5) The frozen part of the notation related to the fact that q represents a particular distribution with a particular set of parameters Once weve done so, we can then generate random numbers, evaluate the density, etc., all from this xed distribution
In [14]: q.cdf(0.4) # Cumulative distribution function Out[14]: 0.2665676800000002 In [15]: q.pdf(0.4) # Density function Out[15]: 2.0901888000000004 In [16]: q.ppf(0.8) # Quantile (inverse cdf) function Out[16]: 0.63391348346427079 In [17]: q.mean() Out[17]: 0.5

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.2. SCIPY

111

The general syntax for creating these objects is


identifier = scipy.stats.distribution_name(shape_parameters)

where distribution_name is one of the distribution names in scipy.stats There are also two keyword arguments, loc and scale:
identifier = scipy.stats.distribution_name(shape_parameters, `loc=c`, `scale=d`)

These transform the original random variable X into Y = c + dX The methods rvs, pdf, cdf, etc. are transformed accordingly Before nishing this section, we note that there is an alternative way of calling the methods described above For example, the previous code can be replaced by
import numpy as np from scipy.stats import beta from matplotlib.pyplot import hist, plot, show obs = beta.rvs(5, 5, size=2000) # 2000 observations hist(obs, bins=40, normed=True) grid = np.linspace(0.01, 0.99, 100) plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2) show()

Other Goodies in scipy.stats There are also many statistical functions in scipy.stats For example, scipy.stats.linregress implements simple linear regression
In [19]: from scipy.stats import linregress In [20]: x = np.random.randn(200) In [21]: y = 2 * x + 0.1 * np.random.randn(200) In [22]: gradient, intercept, r_value, p_value, std_err = linregress(x, y) In [23]: gradient, intercept Out[23]: (1.9962554379482236, 0.008172822032671799)

To see the full list of statistical functions, consult the documentation

Roots and Fixed Points


A root of a real function f on [ a, b] is an x [ a, b] such that f ( x ) = 0 For example, if we plot the function f ( x ) = sin(4( x 1/4)) + x + x20 1 with x [0, 1] we get T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014 (3.3)

3.2. SCIPY

112

The unique root is approximately 0.408 Lets consider some numerical techniques for nding roots Bisection One of the most common algorithms for numerical root nding is bisection To understand the idea, recall the well known game where Player A thinks of a secret number between 1 and 100 Player B asks if its less than 50 If yes, B asks if its less than 25 If no, B asks if its less than 75 And so on This is bisection. Heres a fairly simplistic implementation of the algorithm in Python
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: bisection.py Authors: John Stachurski, Thomas J. Sargent LastModified: 11/08/2013 """ def bisect(f, a, b, tol=10e-5): """ Implements the bisection root finding algorithm, assuming that f is a real-valued function on [a, b] satisfying f(a) < 0 < f(b). """ lower, upper = a, b while upper - lower > tol: middle = 0.5 * (upper + lower) # === if root is between lower and middle === #

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.2. SCIPY

113

if f(middle) > 0: lower, upper = lower, middle # === if root is between middle and upper === # else: lower, upper = middle, upper return 0.5 * (upper + lower)

In fact SciPy provides its own bisection function, which we now test using the function f dened in (3.3)
In [24]: from scipy.optimize import bisect In [25]: f = lambda x: np.sin(4 * (x - 0.25)) + x + x**20 - 1 In [26]: bisect(f, 0, 1) Out[26]: 0.40829350427936706

The Newton-Raphson Method Another very common root-nding algorithm is the NewtonRaphson method In SciPy this algorithm is implemented by scipy.newton Unlike bisection, the Newton-Raphson method uses local slope information This is a double-edged sword: When the function is well-behaved, the Newton-Raphson method is faster than bisection When the function is less well-behaved, the Newton-Raphson might fail Lets investigate this using the same function f , rst looking at potential instability
In [27]: from scipy.optimize import newton In [28]: newton(f, 0.2) # Start the search at initial condition x = 0.2 Out[28]: 0.40829350427935679 In [29]: newton(f, 0.7) # Start the search at x = 0.7 instead Out[29]: 0.70017000000002816

The second initial condition leads to failure of convergence On the other hand, using IPythons timeit magic, we see that newton can be much faster
In [32]: timeit bisect(f, 0, 1) 1000 loops, best of 3: 261 us per loop In [33]: timeit newton(f, 0.2) 10000 loops, best of 3: 60.2 us per loop

Hybrid Methods So far we have seen that the Newton-Raphson method is fast but not robust This bisection algorithm is robust but relatively slow T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.2. SCIPY

114

This illustrates a general principle If you have specic knowledge about your function, you might be able to exploit it to generate efciency If not, then algorithm choice involves a trade-off between speed of convergence and robustness In practice, most default algorithms for root nding, optimization and xed points use hybrid methods These methods typically combine a fast method with a robust method in the following manner: 1. Attempt to use a fast method 2. Check diagnostics 3. If diagnostics are bad, then switch to a more robust algorithm In scipy.optimize, the function brentq is such a hybrid method, and a good default
In [35]: brentq(f, 0, 1) Out[35]: 0.40829350427936706 In [36]: timeit brentq(f, 0, 1) 10000 loops, best of 3: 63.2 us per loop

Here the correct solution is found and the speed is almost the same as newton Multivariate Root Finding Use scipy.optimize.fsolve, a wrapper for a hybrid method in MINPACK See the documentation for details Fixed Points SciPy has a function for nding (scalar) xed points too
In [1]: from scipy.optimize import fixed_point In [2]: fixed_point(lambda x: x**2, 10.0) Out[2]: 1.0 # 10.0 is an initial guess

If you dont get good results, you can always switch back to the brentq root nder, since the xed point of a function f is the root of g( x ) := x f ( x )

Optimization
Most numerical packages provide only functions for minimization Maximization can be performed by recalling that the maximizer of a function f on domain D is the minimizer of f on D Minimization is closely related to root nding: For smooth functions, interior optima correspond to roots of the rst derivative T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.2. SCIPY

115

The speed/robustness trade-off described above is present with numerical optimization too Unless you have some prior information you can exploit, its usually best to use hybrid methods For constrained, univariate (i.e., scalar) minimization, a good hybrid option is fminbound
In [9]: from scipy.optimize import fminbound In [10]: fminbound(lambda x: x**2, -1, 2) Out[10]: 0.0 # Search in [-1, 2]

Multivariate Optimization Multivariate local optimizers include minimize, fmin, fmin_powell, fmin_cg, fmin_bfgs, and fmin_ncg Constrained multivariate local optimizers include fmin_l_bfgs_b, fmin_tnc, fmin_cobyla See the documentation for details

Integration
Most numerical integration methods work by computing the integral of an approximating polynomial The resulting error depends on how well the polynomial ts the integrand, which in turn depends on how regular the integrand is In SciPy, the relevant module for numerical integration is scipy.integrate A good default for univariate integration is quad
In [13]: from scipy.integrate import quad In [14]: integral, error = quad(lambda x: x**2, 0, 1) In [15]: integral Out[15]: 0.33333333333333337

In fact quad is an interface to a very standard numerical integration routine in the Fortran library QUADPACK It uses Clenshaw-Curtis quadrature, based on expansion in terms of Chebychev polynomials There are other options for univeriate integrationa useful one is fixed_quad, which is fast and hence works well inside for loops There are also functions for multivariate integration See the documentation for more details

Linear Algebra
We saw that NumPy provides a module for linear algebra called linalg

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.3. MATPLOTLIB

116

SciPy also provides a module for linear algebra with the same name The latter is not an exact superset of the former, but overall it has more functionality We leave you to investigate the set of available routines

Exercises
Exercise 1 Recall that we previously discussed the concept of recusive function calls Write a recursive implementation of the bisection function described above, which we repeat here for convenience
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: bisection.py Authors: John Stachurski, Thomas J. Sargent LastModified: 11/08/2013 """ def bisect(f, a, b, tol=10e-5): """ Implements the bisection root finding algorithm, assuming that f is a real-valued function on [a, b] satisfying f(a) < 0 < f(b). """ lower, upper = a, b while upper - lower > tol: middle = 0.5 * (upper + lower) # === if root is between lower and middle === # if f(middle) > 0: lower, upper = lower, middle # === if root is between middle and upper === # else: lower, upper = middle, upper return 0.5 * (upper + lower)

Test it on the function f = lambda x: above Solution: View solution

np.sin(4 * (x - 0.25)) + x + x**20 - 1 discussed

3.3 Matplotlib
Overview
Weve already generated quite a few gures in these lectures using Matplotlib

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.3. MATPLOTLIB

117

Matplotlib is interesting in that it provides two fairly different interfaces The rst is designed to mimic MATLAB graphics functionality, and is aimed mainly at beginners The second is object oriented, more Pythonic and more powerful, but requres some more effort to learn In this lecture well cover both, with a focus on the second method

The MATLAB-style API


Matplotlib is very easy to get started with, thanks to its simple MATLAB-style API (Application Progamming Interface) Heres an example plot
from pylab import * x = linspace(0, 10, 200) y = sin(x) plot(x, y, 'b-', linewidth=2) show()

The gure it generates looks as follows

If youve run these commands inside the IPython notebook with the --pylab inline ag the gure will appear embedded in your browser the from pylab import * line is unnecessary If youve run these commands in IPython without the --pylab inline ag, it will appear as a separate window, like so T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.3. MATPLOTLIB

118

The buttons at the bottom of the window allow you to manipulate the gure and then save it if you wish Note that the pylab module combines core parts of matplotlib, numpy and scipy Hence from pylab import * pulls NumPy functions like linspace and sin into the global namespace Most people start working with Matplotlib using this MATLAB style

The Object-Oriented Approach


The MATLAB style API is simple and convenient, but its also a bit limited and somewhat unPythonic For example, we are pulling lots of names into the global namespace, which is not always a good idea Also, there are a lot of implicit function calls behind the scenes, and Python tends to favor explicit over implicit (type import this in the IPython (or Python) shell and look at the second line) Hence its worthwhile adopting the alternative, object oriented API Heres the code corresponding to the preceding gure using this second approach
import matplotlib.pyplot as plt import numpy as np fig, ax = plt.subplots() x = np.linspace(0, 10, 200)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.3. MATPLOTLIB

119

y = np.sin(x) ax.plot(x, y, 'b-', linewidth=2) plt.show()

You can see theres a bit more typing, but the more explicit declarations will help when we need ne-grained control Details: the form of the import statement import matplotlib.pyplot as plt is standard Here the call fig, ax = plt.subplots() returns a pair, where fig is a Figure instancelike a blank canvas ax is a AxesSubplot instancethink of a frame for plotting in The plot() function is actually a method of ax Customization Here weve changed the line to red and added a legend
import matplotlib.pyplot as plt import numpy as np fig, ax = plt.subplots() x = np.linspace(0, 10, 200) y = np.sin(x) ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6) ax.legend() plt.show()

Weve also used alpha to make the line slightly transparentwhich makes it look smoother Unfortunately the legend is obscuring the line T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.3. MATPLOTLIB

120

This can be xed by replacing ax.legend() with ax.legend(loc='upper center')

If everthing is properly congured, then adding LaTeX is trivial


import matplotlib.pyplot as plt import numpy as np fig, ax = plt.subplots() x = np.linspace(0, 10, 200) y = np.sin(x) ax.plot(x, y, 'r-', linewidth=2, label=r'$y=\sin(x)$', alpha=0.6) ax.legend(loc='upper center') plt.show()

The r in front of the label string tells Python that this is a raw string The gure now looks as follows Controlling the ticks, adding titles and so on is also straightforward
import matplotlib.pyplot as plt import numpy as np fig, ax = plt.subplots() x = np.linspace(0, 10, 200) y = np.sin(x) ax.plot(x, y, 'r-', linewidth=2, label=r'$y=\sin(x)$', alpha=0.6) ax.legend(loc='upper center') ax.set_yticks([-1, 0, 1]) ax.set_title('Test plot') plt.show()

Its straightforward to generate mutiple plots on the same axes

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.3. MATPLOTLIB

121

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.3. MATPLOTLIB

122

Heres an example that randomly generates three normal densities and adds a label with their mean
import matplotlib.pyplot as plt import numpy as np from scipy.stats import norm from random import uniform fig, ax = plt.subplots() x = np.linspace(-4, 4, 150) for i in range(3): m, s = uniform(-1, 1), uniform(1, 2) y = norm.pdf(x, loc=m, scale=s) current_label = r'$\mu = {0:.2f}$'.format(m) ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label) ax.legend() plt.show()

At other times we want multiple subplots in one gure Heres an example that generates 6 histograms Notice the lines at the start used to control the LaTeX font
from matplotlib import rc rc('font',**{'family':'serif','serif':['Palatino']}) rc('text', usetex=True) import matplotlib.pyplot as plt import numpy as np from scipy.stats import norm from random import uniform num_rows, num_cols = 3, 2 fig, axes = plt.subplots(num_rows, num_cols, figsize=(8, 12)) for i in range(num_rows):

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.3. MATPLOTLIB

123

for j in range(num_cols): m, s = uniform(-1, 1), uniform(1, 2) x = norm.rvs(loc=m, scale=s, size=100) axes[i, j].hist(x, alpha=0.6, bins=20) t = r'$\mu = {0:.1f}, \quad \sigma = {1:.1f}$'.format(m, s) axes[i, j].set_title(t) axes[i, j].set_xticks([-4, 0, 4]) axes[i, j].set_yticks([]) plt.show()

The output looks as follows

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.3. MATPLOTLIB

124

A Customizing Function Perhaps you will nd a set of customizations that you regularly use Suppose we usually prefer our axes to go through the origin, and to have a grid Heres a nice example from this blog of how the object-oriented API can be used to build a custom subplots function that implements these changes Read carefully through the code and see if you can follow whats going on
import matplotlib.pyplot as plt import numpy as np def subplots(): "Custom subplots with axes throught the origin" fig, ax = plt.subplots() # Set the axes through the origin for spine in ['left', 'bottom']: ax.spines[spine].set_position('zero') for spine in ['right', 'top']: ax.spines[spine].set_color('none') ax.grid() return (fig, ax) fig, ax = subplots() # Call the local version, not plt.subplots() x = np.linspace(-2, 10, 200) y = np.sin(x) ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6) ax.legend(loc='lower right') plt.show()

Heres the gure it produces (note axes through the origin and the grid) The custom subplots function 1. calls the standard plt.subplots function internally to generate the fig, ax pair, 2. makes the desired customizations to ax, and 3. passes the fig, ax pair back to the calling code

Further Reading
The Matplotlib gallery provides many examples A nice Matplotlib tutorial by Nicolas Rougier, Mike Muller and Gael Varoquaux mpltools allows easy switching between plot styles Seaborn facilitates common statistics plots in Matplotlib

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.4. PANDAS

125

3.4 Pandas
Overview of this Lecture
Pandas is a package of fast, efcient data analysis tools for Python Just as NumPy provides the basic array type plus core array operations, pandas denes some fundamental structures for working with data and endows them with methods that form the rst steps of data analysis The most important data type dened by pandas is a DataFrame, which is an object for storing related columns of data In this sense, you can think of a DataFrame as analogous to a (highly optimized) Excel spreadsheet, or as a structure for storing the X matrix in a linear regression In the same way that NumPy specializes in basic array operations and leaves the rest of scientic tool development to other packages (e.g., SciPy, Matplotlib), pandas focuses on the fundamental data types and their methods, leaving other packages to add more sophisticated statistical functionality The strengths of pandas lie in reading in data manipulating rows and columns adjusting indices working with dates and time series

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.4. PANDAS

126

sorting, grouping, re-ordering and general data munging 1 dealing with missing values, etc., etc. This lecture will provide a basic introduction Throughout the lecture we will assume that the following imports have taken place
In [1]: import pandas as pd In [2]: import numpy as np

Series
Perhaps the two most important data types dened by pandas are the DataFrame and Series types You can think of a Series as a column of data, such as a collection of observations on a single variable
In [4]: s = pd.Series(np.random.randn(4), name='daily returns') In [5]: s Out[5]: 0 0.430271 1 0.617328 2 -0.265421 3 -0.836113 Name: daily returns

Here you can imagine the indices 0, 1, 2, 3 as indexing four listed companies, and the values being daily returns on their shares Pandas Series are built on top of NumPy arrays, and support many similar operations
In [6]: s * 100 Out[6]: 0 43.027108 1 61.732829 2 -26.542104 3 -83.611339 Name: daily returns In [7]: np.abs(s) Out[7]: 0 0.430271 1 0.617328 2 0.265421 3 0.836113 Name: daily returns

But Series provide more than NumPy arrays


1

Wikipedia denes munging as cleaning data from one raw form into a structured, purged one.

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.4. PANDAS

127

Not only do they have some additional (statistically orientated) methods


In [8]: Out[8]: count mean std min 25% 50% 75% max s.describe() 4.000000 -0.013484 0.667092 -0.836113 -0.408094 0.082425 0.477035 0.617328

But their indices are more exible


In [9]: s.index = ['AMZN', 'AAPL', 'MSFT', 'GOOG'] In [10]: s Out[10]: AMZN 0.430271 AAPL 0.617328 MSFT -0.265421 GOOG -0.836113 Name: daily returns

Viewed in this way, Series are like fast, efcient Python dictionaries (with the restriction that the items in the dictionary all have the same typein this case, oats) In fact you can use much of the same syntax as Python dictionaries
In [11]: s['AMZN'] Out[11]: 0.43027108469945924 In [12]: s['AMZN'] = 0 In [13]: s Out[13]: AMZN 0.000000 AAPL 0.617328 MSFT -0.265421 GOOG -0.836113 Name: daily returns In [14]: 'AAPL' in s Out[14]: True

DataFrames
As mentioned above a DataFrame is somewhat like a spreadsheet, or a structure for storing the data matrix in a regression While a Series is one individual column of data, a DataFrame is all the columns Lets look at an example, reading in data from the CSV le test_pwd.csv in the main repository T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.4. PANDAS

128

Heres the contents of test_pwd.csv, which is a small excerpt from the Penn World Tables
"country","country isocode","year","POP","XRAT","tcgdp","cc","cg" "Argentina","ARG","2000","37335.653","0.9995","295072.21869","75.716805379","5.5788042896" "Australia","AUS","2000","19053.186","1.72483","541804.6521","67.759025993","6.7200975332" "India","IND","2000","1006300.297","44.9416","1728144.3748","64.575551328","14.072205773" "Israel","ISR","2000","6114.57","4.07733","129253.89423","64.436450847","10.266688415" "Malawi","MWI","2000","11801.505","59.543808333","5026.2217836","74.707624181","11.658954494" "South Africa","ZAF","2000","45064.098","6.93983","227242.36949","72.718710427","5.7265463933" "United States","USA","2000","282171.957","1","9898700","72.347054303","6.0324539789" "Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","5.108067988"

Here were in IPython, so we have access to shell commands such as ls, as well as the usual Python commands
In [15]: ls test_pw* test_pwt.csv

# List all files starting with 'test_pw' -- check CSV file is in present working d

Now lets read the data in using pandas read_csv function


In [28]: df = pd.read_csv('test_pwt.csv') In [29]: type(df) Out[29]: pandas.core.frame.DataFrame In [30]: df Out[30]: country country isocode year POP XRAT 0 Argentina ARG 2000 37335.653 0.999500 1 Australia AUS 2000 19053.186 1.724830 2 India IND 2000 1006300.297 44.941600 3 Israel ISR 2000 6114.570 4.077330 4 Malawi MWI 2000 11801.505 59.543808 5 South Africa ZAF 2000 45064.098 6.939830 6 United States USA 2000 282171.957 1.000000 7 Uruguay URY 2000 3219.793 12.099592

tcgdp 295072.218690 541804.652100 1728144.374800 129253.894230 5026.221784 227242.369490 9898700.000000 25255.961693

cc 0 1 2 3 4 5 6 7

cg 75.716805 5.578804 67.759026 6.720098 64.575551 14.072206 64.436451 10.266688 74.707624 11.658954 72.718710 5.726546 72.347054 6.032454 78.978740 5.108068

We can select particular rows using standard Python array slicing notation
In [13]: df[2:5] Out[13]: country country isocode year POP XRAT 2 India IND 2000 1006300.297 44.941600 3 Israel ISR 2000 6114.570 4.077330 4 Malawi MWI 2000 11801.505 59.543808

tcgdp cc 1728144.374800 64.575551 129253.894230 64.436451 5026.221784 74.707624

cg 14.072206 10.266688 11.658954

To select columns, we can pass a list containing the names of the desired columns represented as strings
In [14]: df[['country', 'tcgdp']] Out[14]: country tcgdp 0 Argentina 295072.218690 1 Australia 541804.652100

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.4. PANDAS

129
1728144.374800 129253.894230 5026.221784 227242.369490 9898700.000000 25255.961693

2 3 4 5 6 7

India Israel Malawi South Africa United States Uruguay

To select a mix of both we can use the ix attribute


In [21]: df.ix[2:5, ['country', 'tcgdp']] Out[21]: country tcgdp 2 India 1728144.374800 3 Israel 129253.894230 4 Malawi 5026.221784 5 South Africa 227242.369490

Lets imagine that were only interested in population and total GDP (tcgdp) One way to strip the data frame df down to only these variables is as follows
In [31]: keep = ['country', 'POP', 'tcgdp'] In [32]: df = df[keep] In [33]: df Out[33]: country 0 Argentina 1 Australia 2 India 3 Israel 4 Malawi 5 South Africa 6 United States 7 Uruguay

POP tcgdp 37335.653 295072.218690 19053.186 541804.652100 1006300.297 1728144.374800 6114.570 129253.894230 11801.505 5026.221784 45064.098 227242.369490 282171.957 9898700.000000 3219.793 25255.961693

Here the index 0, 1,..., 7 is redundant, because we can use the country names as an index To do this, rst lets pull out the country column using the pop method
In [34]: countries = df.pop('country') In [35]: type(countries) Out[35]: pandas.core.series.Series In [36]: countries Out[36]: 0 Argentina 1 Australia 2 India 3 Israel 4 Malawi 5 South Africa

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.4. PANDAS

130

6 United States 7 Uruguay Name: country In [37]: df Out[37]: 0 1 2 3 4 5 6 7

POP tcgdp 37335.653 295072.218690 19053.186 541804.652100 1006300.297 1728144.374800 6114.570 129253.894230 11801.505 5026.221784 45064.098 227242.369490 282171.957 9898700.000000 3219.793 25255.961693

In [38]: df.index = countries In [39]: df Out[39]: country Argentina Australia India Israel Malawi South Africa United States Uruguay

POP

tcgdp

37335.653 295072.218690 19053.186 541804.652100 1006300.297 1728144.374800 6114.570 129253.894230 11801.505 5026.221784 45064.098 227242.369490 282171.957 9898700.000000 3219.793 25255.961693

Lets give the columns slightly better names


In [40]: df.columns = 'population', 'total GDP' In [41]: df Out[41]: country Argentina Australia India Israel Malawi South Africa United States Uruguay

population

total GDP

37335.653 295072.218690 19053.186 541804.652100 1006300.297 1728144.374800 6114.570 129253.894230 11801.505 5026.221784 45064.098 227242.369490 282171.957 9898700.000000 3219.793 25255.961693

Population is in thousands, lets revert to single units


In [66]: df['population'] = df['population'] * 1e3 In [67]: df Out[67]:

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.4. PANDAS

131
population 37335653 19053186 1006300297 6114570 11801505 45064098 282171957 3219793 total GDP 295072.218690 541804.652100 1728144.374800 129253.894230 5026.221784 227242.369490 9898700.000000 25255.961693

country Argentina Australia India Israel Malawi South Africa United States Uruguay

Next were going to add a column showing real GDP per capita, multiplying by 1,000,000 as we go because total GDP is in millions
In [74]: df['GDP percap'] = df['total GDP'] * 1e6 / df['population'] In [75]: df Out[75]: country Argentina Australia India Israel Malawi South Africa United States Uruguay

population

total GDP

GDP percap

37335653 295072.218690 7903.229085 19053186 541804.652100 28436.433261 1006300297 1728144.374800 1717.324719 6114570 129253.894230 21138.672749 11801505 5026.221784 425.896679 45064098 227242.369490 5042.647686 282171957 9898700.000000 35080.381854 3219793 25255.961693 7843.970620

One of the nice things about pandas DataFrame and Series objects is that they have methods for plotting and visualization that work through Matplotlib For example, we can easily generate a bar plot of GDP per capita
In [76]: df['GDP percap'].plot(kind='bar') Out[76]: <matplotlib.axes.AxesSubplot at 0x2f22ed0> In [77]: import matplotlib.pyplot as plt In [78]: plt.show()

The following gure is produced At the moment the data frame is ordered alphabetically on the countrieslets change it to GDP per capita
In [83]: df = df.sort_index(by='GDP percap', ascending=False) In [84]: df Out[84]: country United States Australia

population

total GDP

GDP percap 35080.381854 28436.433261

282171957 9898700.000000 19053186 541804.652100

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.4. PANDAS

132

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.4. PANDAS

133
6114570 129253.894230 21138.672749 37335653 295072.218690 7903.229085 3219793 25255.961693 7843.970620 45064098 227242.369490 5042.647686 1006300297 1728144.374800 1717.324719 11801505 5026.221784 425.896679

Israel Argentina Uruguay South Africa India Malawi

Plotting as before now yields

On-Line Data Sources


pandas makes it straightforward to query several common Internet databases programmatically One particularly important one is FRED a vast collection of time series data maintained by the St. Louis Fed For example, suppose that we are interested in the unemployment rate Via FRED, the entire series for the US civilian rate can be downloaded directly by entering this URL into your browser
http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv

(Equivalently, click here: http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv) T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.4. PANDAS

134

This request returns a CSV le, which will be handled by your default application for this class of les Alternatively, we can access the CSV le from within a Python program This can be done with a variety of methods We start with a relatively low level method, and then return to pandas Accessing Data with urllib2 One option is to use urllib2, a standard Python library for requesting data over the Internet To begin, try the following code on your computer
In [36]: import urllib2 In [37]: web_page = urllib2.urlopen('http://cnn.com')

If theres no error message, then the call has succeeded If you do get an error, then there are two likely causes 1. You are not connected to the Internet hopefully this isnt the case 2. Your machine is accessing the Internet through a proxy server, and Python isnt aware of this In the second case, you can either switch to another machine (for example, log in to Wakari) solve your proxy problem by reading the documentation Assuming that all is working, you can now proceed to using the web_page object returned by the call urllib2.urlopen('http://cnn.com') This object behaves very much like a le object for example, it has a next method
In [38]: web_page.next() Out[38]: '\n' In [39]: web_page.next() Out[39]: '<!DOCTYPE HTML>\n' In [40]: web_page.next() Out[40]: '<html lang="en-US">\n'

The next method returns successive lines from the le returned by CNNs web server in this case the top level HTML page at the site cnn.com Other methods include read, readline, readlines, etc. The same idea can be used to access the CSV le discussed above
In [56]: url = 'http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv' In [57]: source = urllib2.urlopen(url)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.4. PANDAS

135

In [58]: source.next() Out[58]: 'DATE,VALUE\r\n' In [59]: source.next() Out[59]: '1948-01-01,3.4\r\n' In [60]: source.next() Out[60]: '1948-02-01,3.8\r\n'

We could now write some additional code to parse this text and store it as an array... But this is unnecessary pandas read_csv function can handle the task for us
In [69]: source = urllib2.urlopen(url) In [70]: data = pd.read_csv(source, index_col=0, parse_dates=True, header=None)

The data has been read into a pandas DataFrame called data that we can now manipulate in the usual way
In [71]: type(data) Out[71]: pandas.core.frame.DataFrame In [72]: data.head() # A useful method to get a quick look at a data frame Out[72]: 1 0 DATE VALUE 1948-01-01 3.4 1948-02-01 3.8 1948-03-01 4.0 1948-04-01 3.9 In [73]: data.describe() Out[73]: 1 count 786 unique 81 top 5.4 freq 31

Accessing Data with pandas Although it is worth understanding the low level procedures, for the present case pandas can take care of all these messy details (pandas puts a simple API (Application Progamming Interface) on top of the kind of low level function calls weve just covered) For example, we can obtain the same unemployment data for the period 20062012 inclusive as follows
In [77]: import pandas.io.data as web In [78]: import datetime as dt # Standard Python date / time library

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.4. PANDAS

136

In [79]: start, end = dt.datetime(2006, 1, 1), dt.datetime(2012, 12, 31) In [80]: data = web.DataReader('UNRATE', 'fred', start, end) In [81]: type(data) Out[81]: pandas.core.frame.DataFrame In [82]: data.plot() Out[82]: <matplotlib.axes.AxesSubplot at 0xcf79390> In [83]: import matplotlib.pyplot as plt In [84]: plt.show()

(If youre working in the IPython notebook, the last two lines can probably be omitted) The resulting gure looks as follows

Data from the World Bank Lets look at one more example of downloading and manipulating data this time from the World Bank The World Bank collects and organizes data on a huge range of indicators For example, here we nd data on government debt as http://data.worldbank.org/indicator/GC.DOD.TOTL.GD.ZS/countries a ratio to GDP:

If you click on DOWNLOAD DATA you will be given the option to download the data as an Excel le The next program does this for you, parses the data from Excel le to pandas DataFrame, and plots time series for France, Germany, the US and Australia
import pandas as pd import matplotlib.pyplot as plt

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.4. PANDAS

137

from pandas.io.parsers import ExcelFile import urllib # == Get data and read into file gd.xls == # wb_data_file_dir = "http://api.worldbank.org/datafiles/" file_name = "GC.DOD.TOTL.GD.ZS_Indicator_MetaData_en_EXCEL.xls" url = wb_data_file_dir + file_name urllib.urlretrieve(url, "gd.xls") # == Parse data into a DataFrame == # gov_debt_xls = ExcelFile('gd.xls') govt_debt = gov_debt_xls.parse('Sheet1', index_col=1, na_values=['NA']) # == Take desired values and plot == # govt_debt = govt_debt.transpose() govt_debt = govt_debt[['AUS', 'DEU', 'FRA', 'USA']] govt_debt = govt_debt[36:] govt_debt.plot(lw=2) plt.show()

(The le is wb_download.py from the main repository) The gure it produces looks as follows

(Missing line segments indicate missing data values) Actually pandas includes high-level functions for downloading World Bank data For example, seee http://pandas.pydata.org/pandas-docs/dev/remote_data.html#world-bank

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

138

Exercises
Exercise 1 Write a program to calculate the percentage price change since the start of the year for the following shares
ticker_list = {'INTC': 'Intel', 'MSFT': 'Microsoft', 'IBM': 'IBM', 'BHP': 'BHP', 'RSH': 'RadioShack', 'TM': 'Toyota', 'AAPL': 'Apple', 'AMZN': 'Amazon', 'BA': 'Boeing', 'QCOM': 'Qualcomm', 'KO': 'Coca-Cola', 'GOOG': 'Google', 'SNE': 'Sony', 'PTR': 'PetroChina'}

Use pandas to download the data from Yahoo Finance Hint: Try replacing data = web.DataReader('UNRATE', 'fred', start, end) with data = web.DataReader('AAPL', 'yahoo', start, end) in the code above Plot the result as a bar graph, such as this one (of course actual results will vary)

Solution: View solution

3.5 IPython Shell and Notebook


Debugging is twice as hard as writing the code in the rst place. Therefore, if you write the code as cleverly as possible, you are, by denition, not smart enough to debug T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

139

it. Brian Kernighan

Overview
As you know by now, IPython is not really a scientic library its an enhanced Python command interface oriented towards scientic workow Having a good grasp of IPython can make a very large difference in your productivity Here we briey review some of IPythons more important features The rst part of the lecture focuses on the IPython shell The second part focuses on the IPython notebook

The IPython Shell


The IPython shell is what we use for day to day programming When we rst met the IPython shell, we learned how to start the shell and run programs Lets now look at some of its features in more depth Line Magics As discussed earlier, any Python command can be typed into an IPython shell
In [1]: 'foo' * 2 Out[1]: 'foofoo'

A program foo.py in the current working directory can be executed using run
In [2]: run foo.py

Note that run is not a Python command Rather it is an IPython magic one of a set of very useful commands particular to IPython Sometimes IPython magics need to be prexed by % (e.g., %run foo.py) You can toggle this by running %automagic Well meet several more IPython magics in this lecture Timing Code For scientic calculations, we often need to know how long certain blocks of code take to run For this purpose, IPython includes the timeit magic Usage is very straightforward lets look at an example In earier exercises, we wrote two different functions to calculate the value of a polynomial Lets put them in a le called temp.py as follows

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

140

## Filename: temp.py import numpy as np def p1(x, coef): return sum(a * x**i for i, a in enumerate(coef)) def p2(x, coef): X = np.empty(len(coef)) X[0] = 1 X[1:] = x y = np.cumprod(X) # y = [1, x, x**2,...] return np.dot(coef, y)

Note that p1 uses pure Python, whereas p2 uses NumPy arrays and should run faster Heres how we can test this
In [1]: run temp.py In [2]: p1(10, (1, 2)) Out[2]: 21 In [3]: p2(10, (1, 2)) Out[3]: 21.0 # Let's make sure the function works OK # Ditto

In [4]: coef = np.random.randn(1000) In [5]: timeit p1(0.9, coef) 1000 loops, best of 3: 1.15 ms per loop In [6]: timeit p2(0.9, coef) 100000 loops, best of 3: 9.87 us per loop

For p1, average execution time was 1.15 milliseconds, while for p2 it was about 10 microseconds (i.e., millionths of a second) two orders of magnitude faster Reloading Modules Here is one very common Python gotcha and a nice solution provided by IPython When we work with multiple les, changes in one le are not always visible in our program To see this, suppose that you are working with les useful_functions.py and main_program.py As the names suggest, the main program resides in main_program.py but imports functions from useful_functions.py You might have noticed that if you make a change to useful_functions.py and then re-run main_program.py, the effect of that change isnt always apparent Heres an example useful_functions.py in the current directory
## Filename: useful_functions.py def meaning_of_life():

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

141

"Computes the meaning of life" return 42

Here is main_program.py, which imports the former


## Filename: main_program.py from useful_functions import meaning_of_life x = meaning_of_life() print "The meaning of life is: {}".format(x)

When we run main_program.py we get the expected output


In [1]: run main_program.py The meaning of life is: 42

Now suppose that we discover the meaning of life is actually 43 So we open up a text editor, and change the contents of useful_functions.py to
## Filename: useful_functions.py def meaning_of_life(): "Computes the meaning of life" return 43

However, if we run main_program.py again no change is visible


In [2]: run main_program.py The meaning of life is: 42

The reason is that useful_functions.py has been compiled to a byte code le, in preparation for sending its instructions to the Python virtual machine The byte code le will be called useful_functions.pyc, and live in the same directory as useful_functions.py Even though weve modied useful_functions.py, useful_functions.pyc this change is not reected in

The nicest way to get your dependencies to recompile is to use IPythons autoreload extension
In [3]: %load_ext autoreload In [4]: autoreload 2 In [5]: run main_program.py The meaning of life is: 43

If you want this behavior to load automatically when you start IPython, add these lines to your ipython_config.py le
c.InteractiveShellApp.extensions = ['autoreload'] c.InteractiveShellApp.exec_lines = ['%autoreload 2']

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

142

Google IPython configuration for more details Incidentally, if you prefer to do things manually, you can also import and then reload the modied module
In [3]: import useful_functions In [4]: reload(useful_functions)

For any subsequent changes, you will only need reload(useful_functions)

Debugging
Are you one of those programmers who lls their code with print statements when trying to debug their programs? Hey, its OK, we all used to do that But today might be a good day to turn a new page, and start using a debugger Debugging is a big topic, but its actually very easy to learn the basics The standard Python debugger is pdb, although here we use a slightly fancier one called ipdb that plays well with IPython Either pdb or ipdb will do the job ne Lets look at an example of when and how to use them The debug Magic Lets consider a simple (and rather contrived) example, where we have a script called temp.py with the following contents
import numpy as np import matplotlib.pyplot as plt def plot_log(): fig, ax = plt.subplots(2, 1) x = np.linspace(1, 2, 10) ax.plot(x, np.log(x)) plt.show() plot_log() # Call the function, generate plot

This code is intended to plot the log function over the interval [1, 2] But theres an error here: plt.subplots(2, 1) should be just plt.subplots() (The call plt.subplots(2, 1) returns a NumPy array containing two axes objects, suitable for having two subplots on the same gure) Heres what happens when we run the code
In [1]: run temp.py --------------------------------------------------------------------------AttributeError Traceback (most recent call last)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

143

/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname,*where) 176 else: 177 filename = fname --> 178 __builtin__.execfile(filename,*where) /home/john/temp/temp.py in <module>() 8 plt.show() 9 ---> 10 plot_log() /home/john/temp/temp.py in plot_log() 5 fig, ax = plt.subplots(2, 1) 6 x = np.linspace(1, 2, 10) ----> 7 ax.plot(x, np.log(x)) 8 plt.show() 9 AttributeError: 'numpy.ndarray' object has no attribute 'plot'

The traceback shows that the error occurs at the method call ax.plot(x, np.log(x)) The error occurs because we have mistakenly made ax a NumPy array, and a NumPy array has no plot method But lets pretend that we dont understand this for the moment We might suspect theres something wrong with ax, but when we try to investigate this object
In [2]: ax --------------------------------------------------------------------------NameError Traceback (most recent call last) <ipython-input-2-645aedc8a285> in <module>() ----> 1 ax NameError: name 'ax' is not defined

The problem is that ax was dened inside plot_log(), and the name is lost once that function terminates Lets try doing it a different way First we run temp.py again, but this time we respond to the exception by typing debug This will cause us to be dropped into the Python debugger at the point of execution just before the exception occurs
In [1]: run temp.py --------------------------------------------------------------------------AttributeError Traceback (most recent call last) /usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname,*where) 176 else: 177 filename = fname --> 178 __builtin__.execfile(filename,*where) /home/john/temp/temp.py in <module>()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

144

8 plt.show() 9 ---> 10 plot_log() /home/john/temp/temp.py in plot_log() 5 fig, ax = plt.subplots(2, 1) 6 x = np.linspace(1, 2, 10) ----> 7 ax.plot(x, np.log(x)) 8 plt.show() 9 AttributeError: 'numpy.ndarray' object has no attribute 'plot' In [2]: debug > /home/john/temp/temp.py(7)plot_log() 6 x = np.linspace(1, 2, 10) ----> 7 ax.plot(x, np.log(x)) 8 plt.show() ipdb>

Were now at the ipdb> prompt, at which we can investigate the value of our variables at this point in the program, step forward through the code, etc. For example, here we simply type the name ax to see whats happening with this object
ipdb> ax array([<matplotlib.axes.AxesSubplot object at 0x290f5d0>, <matplotlib.axes.AxesSubplot object at 0x2930810>], dtype=object)

Its now very clear that ax is an array, which claries the source of the problem To nd out what else you can do from inside ipdb (or pdb), use the on line help
ipdb> h Documented commands (type help <topic>): ======================================== EOF bt cont enable jump pdef a c continue exit l pdoc alias cl d h list pinfo args clear debug help n pp b commands disable ignore next q break condition down j p quit Miscellaneous help topics: ========================== exec pdb Undocumented commands: ====================== retval rv ipdb> h c

r restart return run s step

tbreak u unalias unt until up

w whatis where

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

145

c(ont(inue)) Continue execution, only stop when a breakpoint is encountered.

Setting a Break Point The preceding approach is handy but sometimes insufcent For example, consider the following modied version of temp.py
import numpy as np import matplotlib.pyplot as plt def plot_log(): fig, ax = plt.subplots() x = np.logspace(1, 2, 10) ax.plot(x, np.log(x)) plt.show() plot_log()

Here the original problem is xed, by weve accidentally written np.logspace(1, 2, 10) instead of np.linspace(1, 2, 10) Now there wont be any exception, but the plot will not look right To use the debugger to investigate, we can add a break point, by inserting the line import ipdb; ipdb.set_trace() in a suitable location
import numpy as np import matplotlib.pyplot as plt def plot_log(): import ipdb; ipdb.set_trace() fig, ax = plt.subplots() x = np.logspace(1, 2, 10) ax.plot(x, np.log(x)) plt.show() plot_log()

Now lets run the script, and investigate via the debugger
In [3]: run temp.py > /home/john/temp/temp.py(6)plot_log() 5 import ipdb; ipdb.set_trace() ----> 6 fig, ax = plt.subplots() 7 x = np.logspace(1, 2, 10) ipdb> n > /home/john/temp/temp.py(7)plot_log() 6 fig, ax = plt.subplots() ----> 7 x = np.logspace(1, 2, 10) 8 ax.plot(x, np.log(x)) ipdb> n

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

146

> /home/john/temp/temp.py(8)plot_log() 7 x = np.logspace(1, 2, 10) ----> 8 ax.plot(x, np.log(x)) 9 plt.show() ipdb> x array([ 10. , 12.91549665, 16.68100537, 27.82559402, 35.93813664, 46.41588834, 77.42636827, 100. ]) 21.5443469 , 59.94842503,

Here we used n twice to step forward through the code (one line at a time), and then printed the value of x to see what was happening with that variable

IPython Notebook
The IPython notebook combines a convenient browser-based interface to IPython with the ability to mix in formatted text and mathematical expressions between cells Here we cover only the basics Starting the Notebook In essence, starting the IPython notebook simply involves opening up a terminal / Powershell and typing ipython notebook Heres an example (click to enlarge)

Notice the line The IPython Notebook is running at:

http://127.0.0.1:8888/

As you might be aware, http://127.0.0.1 refers to (i.e., is the IP address of) your local machine The 8888 at the end refers to port number 8888 on your computer The IPython kernel is now listening on that port At the same time you see this data, your default browser should open up with a web page that looks something like this (click to enlarge)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

147

What you see here is called the IPython dashboard If you look at the URL at the top left, youll see that its http://127.0.0.1:8888, matching the message above Sometimes we prefer to start the notebook using ipython notebook --no-browser, which does everything as above except opening the browser You can now do that manually, and then enter the URL http://127.0.0.1:8888 (or whatever address comes up in the start-up message) Assuming all this has worked OK, you can now click on New Notebook and see something like this

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

148

The notebook displays an active cell, into which you can type Python commands To run the commands in a cell, type Shift-Enter instead of the usual Enter

Inline Figures One of the nice things about IPython notebooks is that gures can be displayed inside the page To achieve this effect, use the matplotlib inline magic, like so (click to enlarge)

Note that some people use pylab inline instead, or start the notebook with ipython notebook --pylab-inline T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

149

We recommend against this see our previous discussion of pylab Working with Python Files How does one run and experiment with an existing Python le using the notebook? In fact you can do it in the same manner as the IPython shell: 1. navigate to the correct directory 2. use run followed by the le name However, its often convenient to be able to see your code For this purpose we can use the load magic, and then Shift + Enter to execute

To save the contents of a cell as le foo.py, put %%file foo.py as the rst line of the cell and then Shift + Enter (Here %%file is an example of a cell magic) Documentation In addition to executing code, the IPython notebook allows you to embed text, equations, gures and even videos between the cells For example, here we enter text instead of code Next we select Markdown and then Shift + Enter to produce this (Click to enlarge and observe the mouse pointer to see where Markdown was selected) If youre not familiar with it, Markdown is a mark-up language, similar to (but simpler than) LaTeX

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

150

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

151

Sharing Notebooks A notebook can easily be saved and shared between users Notebook les are just text les (structured as JSON les) and typically end with .ipynb You can download a rather uninteresting one here Try saving the le on your computer and then importing it from the dashboard (the rst browser page that opens when you start IPython notebook) You should see something like this

Of course you can manipulate the contents, save it, share it, etc. Here are some much more interesting examples of notebooks created by the scientic community

Python in the Cloud


So far weve discussed running Python on your local machine Theres another option that is fun and often convenient running Python in the cloud One way to do this is to set up a free account at Wakari Once youve done that and signed in, you will have the option to open a new IPython notebook Now, in one of the cells, type
!git clone https://github.com/jstac/quant-econ

and then Shift+Enter This is the standard Git command to install the main repository, apart from the ! at the start

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

3.5. IPYTHON SHELL AND NOTEBOOK

152

This ! tells the IPython notebook that the command we are executing is an ordinary shell command, not a Python or IPython command If this works, you should now have the main repository sitting in your pwd, and you can cd into it and get programming in the same manner described above The big difference is that your programs are now running on Amazons massive web service infrastructure!

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

CHAPTER

FOUR

INTRODUCTORY APPLICATIONS
This section of the course contains relatively simple applications, one purpose of which is to teach you more about the Python programming environment

4.1 Linear Algebra


Overview
One of the single most useful branches of mathematics you can learn is linear algebra For example, many applied problems in economics, nance, operations research and other elds of science require the solution of a linear system of equations, such as y1 = ax1 + bx2 y2 = cx1 + dx2 or, more generally, y1 = a11 x1 + a12 x2 + + a1k xk . . . yn = an1 x1 + an2 x2 + + ank xk The objective here is to solve for the unknowns x1 , . . . , xk given a11 , . . . , ank and y1 , . . . , yn When considering such problems, it is essential that we rst consider at least some of the following questions Does a solution actually exist? Are there in fact many solutions, and if so how should we interpret them? If no solution exists, is there a best approximate solution? If a solution exists, how should we compute it? These are the kinds of topics addressed by linear algebra In this lecture we will cover the basics of linear and matrix algebra, treating both theory and computation (4.1)

153

4.1. LINEAR ALGEBRA

154

We admit some overlap with this lecture, where operations on NumPy arrays were rst explained Note that this lecture is more theoretical than most, and contains background material that will be used in applications as we go along

Vectors
A vector of length n is just a sequence (or array, or tuple) of n numbers, which we write as x = ( x1 , . . . , xn ) or x = [ x1 , . . . , xn ] We will write these sequences either horizontally or vertically as we please (Later, when we wish to perform certain matrix operations, it will become necessary to distinguish between the two) For example, R2 is the plane, and a vector in R2 is just a point in the plane Traditionally, vectors are represented visually as arrows from the origin to the point The following gure represents three vectors in this manner The set of all n-vectors is denoted by Rn

If youre interested, the Python code for producing this gure is here Vector Operations The two most common operators for vectors are addition and scalar multiplication, which we now describe

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.1. LINEAR ALGEBRA

155

As a matter of denition, when we add two vectors, we add them element by element x1 y1 x1 + y1 x2 y2 x2 + y2 x + y = . + . := . . . . . . . xn yn xn + yn Scalar multiplication is an operation that takes a number and a vector x and produces x1 x2 x := . . . xn Scalar multiplication is illustrated in the next gure

In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is more commonly represented as a NumPy array One advantage of NumPy arrays is that scalar multiplication and addition have very natural syntax
In [1]: import numpy as np In [2]: x = np.ones(3) In [3]: y = np.array((2, 4, 6)) In [4]: x + y Out[4]: array([ 3., 5., 7.]) # Vector of three ones # Converts tuple (2, 4, 6) into array

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.1. LINEAR ALGEBRA

156

In [5]: 4 * x Out[5]: array([ 4., 4.,

4.])

Inner Product and Norm The inner product of vectors x, y Rn is dened as x y :=

i =1

xi yi

Two vectors are called orthogonal if their inner product is zero The norm of a vector x represents its length (i.e., its distance from the zero vector) and is dened as 1/2 n 2 x : = x x : = xi
i =1

The expression x y is thought of as the distance between x and y Continuing on from the previous example, the inner product and norm can be computed as follows
In [6]: np.sum(x * y) Out[6]: 12.0 In [7]: np.sqrt(np.sum(x**2)) Out[7]: 1.7320508075688772 In [8]: np.linalg.norm(x) Out[8]: 1.7320508075688772 # Inner product of x and y # Norm of x, take one # Norm of x, take two

Span Given a set of vectors A := { a1 , . . . , ak } in Rn , its natural to think about the new vectors we can create by performing linear operations New vectors created in this manner are called linear combinations of A In particular, y Rn is a linear combination of A := { a1 , . . . , ak } if y = 1 a1 + + k ak for some scalars 1 , . . . , k In this context, the values 1 , . . . , k are called the coefcients of the linear combination The set of linear combinations of A is called the span of A The next gure shows the span of A = { a1 , a2 } in R3 The span is a 2 dimensional plane passing through these two points and the origin The code for producing this gure can be found here

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.1. LINEAR ALGEBRA

157

Examples If A contains only one vector a1 R2 , then its span is just the scalar multiples of a1 , which is the unique line passing through both a1 and the origin If A = {e1 , e2 , e3 } is the canonical basis vectors of R3 , that is 0 0 1 e1 : = 0 , e2 : = 1 , e3 : = 0 1 0 0 then the span of A is all of R3 , because, for any x = ( x1 , x2 , x3 ) R3 , we can write x = x 1 e1 + x 2 e2 + x 3 e3 Now consider A0 = {e1 , e2 , e1 + e2 } If y = (y1 , y2 , y3 ) is any linear combination of these vectors, then y3 = 0 (check it) Hence A0 fails to span all of R3 Linear Independence As well see, its often desirable to nd families of vectors with relatively large span, so that many vectors can be described by linear operators on a few vectors The condition we need for a set of vectors to have a large span is whats called linear independence In particular, a collection of vectors A := { a1 , . . . , ak } in Rn is said to be linearly dependent if some strict subset of A has the same span as A linearly independent if it is not linearly dependent Put differently, a set of vectors is linearly independent if no vector is redundant to the span, and linearly dependent otherwise T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.1. LINEAR ALGEBRA

158

To illustrate the idea, recall the gure that showed the span of vectors { a1 , a2 } in through the origin If we take a third vector a3 and form the set { a1 , a2 , a3 }, this set will be linearly dependent if a3 lies in the plane linearly independent otherwise

R3 as a plane

As another illustration of the concept, since Rn can be spanned by n vectors (see the discussion of canonical basis vectors above), any collection of m > n vectors in Rn must be linearly dependent The following statements are equivalent to linear independence of A := { a1 , . . . , ak } Rn 1. No vector in A can be formed as a linear combination of the other elements (The zero in the rst expression is the origin of Rn ) 2. If 1 a1 + k ak = 0 for scalars 1 , . . . , k , then 1 = = k = 0

Unique Representations Another nice thing about sets of linearly independent vectors is that each element in the span has a unique representation as a linear combination of these vectors In other words, if A := { a1 , . . . , ak } Rn is linearly independent and y = 1 a1 + k a k then no other coefcient sequence 1 , . . . , k will produce the same vector y Indeed, if we also have y = 1 a1 + k ak , then

( 1 1 ) a1 + + ( k k ) ak = 0
Linear independence now implies i = i for all i

Matrices
Matrices are a neat way of organizing data for use in linear operations An n k matrix is a rectangular array A of numbers with n rows and k columns: a11 a12 a1k a21 a22 a2k A= . . . . . . . . . an1 an2 ank Often, the numbers in the matrix represent coefcients in a system of linear equations, as discussed at the start of this lecture For obvious reasons, the matrix A is also called a vector if either n = 1 or k = 1 In the former case, A is called a row vector, while in the latter it is called a column vector If n = k, then A is called square T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.1. LINEAR ALGEBRA

159

The matrix formed by replacing aij by a ji for every i and j is called the transpose of A, and denoted A or A If A = A , then A is called symmetric For a square matrix A, the i elements of the form aii for i = 1, . . . , n are called the principal diagonal A is called diagonal if the only nonzero entries are on the principal diagonal If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then A is called the identity matrix, and denoted by I Matrix Operations Just as was the case for vectors, a number of algebraic operations are dened for matrices Scalar multiplication and addition are immediate generalizations of the vector case: a11 a1k a11 a1k . . . . . . . . . . A = . . . . := . . . . an1 ank and a11 a1k . . . . . A+B = . . . . + an1 ank b11 . . . bn 1 b1k a11 + b11 a1k + b1k . . . . . . . . . . . . := . . . bnk an1 + bn1 ank + bnk an1 ank

In the latter case, the matrices must have the same shape in order for the denition to make sense We also have a convention for multiplying two matrices The rule for matrix multiplication generalizes the idea of inner products discussed above, and is designed to make multiplication play well with basic linear operations If A and B are two matrices, then their product AB is formed by taking as its i, j-th element the inner product of the i-th row of A and the j-th column of B There are many tutorials to help you visualize this operation, such as this one, or the discussion on the Wikipedia page If A is n k and B is j m, then to multiply A and B we require k = j, and the resulting matrix AB is n m As perhaps the most important special case, consider multiplying n k matrix A and k 1 column vector x According to the preceding rule, this gives us a n 1 column vector a11 a1k x1 a11 x1 + + a1k xk . . . . . . . . Ax = . . . . . . := . an1 ank xk an1 x1 + + ank xk

(4.2)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.1. LINEAR ALGEBRA

160

Note: AB and BA are not generally the same thing Another important special case is the identity matrix You should check that if A is n k and I is the k k identity matrix, then AI = A If I is the n n identity matrix, then I A = A Matrices in NumPy NumPy arrays are also used as matrices, and have fast, efcient functions and methods for all the standard matrix operations 1 You can create them manually from tuples of tuples (or lists of lists) as follows
In [1]: import numpy as np In [2]: A = ((1, 2), ...: (3, 4)) In [3]: type(A) Out[3]: tuple In [4]: A = np.array(A) In [5]: type(A) Out[5]: numpy.ndarray In [6]: A.shape Out[6]: (2, 2)

The shape attribute is a tuple giving the number of rows and columns see here for more discussion To get the transpose of A, use A.transpose() or, more simply, A.T There are many convenience functions for creating common matrices (matrices of zeros, ones, etc.) see here Since operations are performed elementwise by default, scalar multiplication and addition have very natural syntax
In [8]: A = np.identity(3) In [9]: B = np.ones((3, 3)) In [10]: Out[10]: array([[ [ [ 2 * A 2., 0., 0.], 0., 2., 0.], 0., 0., 2.]])

In [11]: A + B
Although there is a specialized matrix data type dened in NumPy, its more standard to work with ordinary NumPy arrays
1

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.1. LINEAR ALGEBRA

161

Out[11]: array([[ 2., 1., 1.], [ 1., 2., 1.], [ 1., 1., 2.]])

To multiply matrices we use np.dot In particular, np.dot(A, B) is matrix multiplication, whereas A * B is element by element multiplication See here for more discussion Matrices as Maps Each n k matrix A can be identied with a function f ( x ) = Ax that maps x Rk into y = Ax Rn These kinds of functions have a special property: they are linear A function f :

Rk Rn is called linear if, for all x, y Rk and all scalars , , we have


f ( x + y) = f ( x ) + f (y)

You can check that this holds for the function f ( x ) = Ax + b when b is the zero vector, and fails when b is nonzero In fact, its known that f is linear if and only if there exists a matrix A such that f ( x ) = Ax for all x.

Solving Systems of Equation


Recall again the system of equations (4.1) If we compare (4.1) and (4.2), we see that (4.1) can now be written more conveniently as y = Ax (4.3)

The problem we face is to determine a vector x Rk that solves (4.3), taking y and A as given This is a special case of a more general problem: Find an x such that y = f ( x ) Given an arbitrary function f and a y, is there always an x such that y = f ( x )? If so, is it always unique? The answer to both these questions is negative, as the next gure shows In the rst plot there are multiple solutions, as the function is not one-to-one, while in the second there are no solutions, since y lies outside the range of f Can we impose conditions on A in (4.3) that rule out these problems? In this context, the most important thing to recognize about the expression Ax is that it corresponds to a linear combination of the columns of A

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.1. LINEAR ALGEBRA

162

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.1. LINEAR ALGEBRA

163

In particular, if a1 , . . . , ak is the columns of A, then Ax = x1 a1 + + xk ak Hence the range of f ( x ) = Ax is exactly the span of the columns of A We want the range to be large, so that it contains arbitrary y As you might recall, the condition that we want for the span to be large is linear independence A happy fact is that linear independence of the columns of A also gives us uniqueness Indeed, it follows from our earlier discussion that if { a1 , . . . , ak } is linearly independent and y = Ax = x1 a1 + + xk ak , then no z = x satises y = Az The n n Case Lets discuss some more details, starting with the case where A is n n This is the familiar case where the number of unknowns and equations is the same For arbitrary y Rn , we hope to nd a unique x Rn such that y = Ax In view of the observations immediately above, if the columns of A are linearly independent, then their span, and hence the range of f ( x ) = Ax, is all of Rn Hence their always exists an x such that y = Ax Moreover, the solution is unique In particular, the following are equivalent 1. The columns of A are linearly independent 2. For any y Rn , the equation y = Ax has a unique solution The property of having linearly independent columns is sometimes expressed as having full column rank Inverse Matrices Can we give some sort of expression for the solution? If y and A are scalar with A = 0, then the solution is x = A1 y A similar expression is available in the matrix case In particular, if square matrix A has full column rank, then it possesses a multiplicative inverse matrix A1 , with the property that AA1 = A1 A = I As a consequence, if we pre-multiply both sides of y = Ax by A1 , we get x = A1 y This is the solution that were looking for Determinants Another quick comment about square matrices is that to every such matrix we assign a unique number called the determinant of the matrix you can nd the expression for it here If the determinant of A is not zero, then we say that A is nonsingular

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.1. LINEAR ALGEBRA

164

Perhaps the most important fact about determinants is that A is nonsingular if and only if A is of full column rank This gives us a useful one-number summary of whether or not a square matrix can be inverted More Rows than Columns This is the n k case with n > k This case is very important in many settings, not least in the setting of linear regression (where n is the number of observations, and k is the number of explanatory variables) Given arbitrary y Rn , we seek an x Rk such that y = Ax In this setting, existence of a solution is highly unlikely Without much loss of generality, lets go over the intuition focusing on the case where the columns of A are linearly independent This span is very unlikely to contain arbitrary y Rn It follows that the span of the columns of A is a k-dimensional subspace of Rn

To see why, recall the gure above, where k = 2 and n = 3 Imagine an arbitrarily chosen y R3 , located somewhere in that three dimensional space Whats the likelihood that y lies in the span of { a1 , a2 } (i.e., the two dimensional plane through these points)? In a sense it must be very small, since this plane has zero thickness As a result, in the n > k case we usually give up on existence However, we can still think of the best possible approximation, which is the x that makes the distance y Ax as small as possible To solve this problem, one can use either calculus or the theory of orthogonal projections = ( A A)1 A y see for example chapter 3 of these notes The solution is known to be x More Columns than Rows This is the n k case with n < k, so there are fewer equations than unknowns In this case there are either no solutions or innitely many in other words, uniqueness never holds For example, consider the case where k = 3 and n = 2 Thus, the columns of A consists of 3 vectors in R2 (For example, use the canonical basis vectors) It follows that one column is a linear combination of the other two For example, lets say that a1 = a2 + a3 This set can never be linearly independent, since 2 vectors is enough to span R2

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.1. LINEAR ALGEBRA

165

The if y = Ax = x1 a1 + x2 a2 + x3 a3 , we can also write y = x1 ( a2 + a3 ) + x2 a2 + x3 a3 = ( x1 + x2 ) a2 + ( x1 + x3 ) a3 In other words, uniqueness fails Linear Equations with SciPy Heres an illustration of how to solve linear equations with SciPys linalg submodule All of these routines are Python front ends to time-tested and highly optimized FORTRAN code
In [9]: import numpy as np In [10]: from scipy.linalg import inv, solve, det In [11]: A = ((1, 2), (3, 4)) In [12]: A = np.array(A) In [13]: y = np.ones((2, 1)) In [14]: det(A) Out[14]: -2.0 # Column vector

# Check that A is nonsingular, and hence invertible # Compute the inverse

In [15]: A_inv = inv(A) In [16]: A_inv Out[16]: array([[-2. , 1. ], [ 1.5, -0.5]])

In [17]: x = np.dot(A_inv, y) # Solution In [18]: np.dot(A, x) Out[18]: array([[ 1.], [ 1.]]) In [19]: solve(A, y) Out[19]: array([[-1.], [ 1.]]) # Should equal y

# Produces same solution

Observe how we can solve for x = A1 y by either via np.dot(inv(A), y), or using solve(A, y) The latter method uses a different algorithm (LU decomposition) that is numerically more stable, and hence should almost always be preferred = ( A A)1 A y, use scipy.linalg.lstsq(A, y) To obtain the least squares solution x

Eigenvalues and Eigenvectors


Let A be an n n square matrix T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.1. LINEAR ALGEBRA

166

If is scalar and v is a vector in Rn such that Av = v then we say that is an eigenvalue of A, and v is an eigenvector Thus, an eigenvector of A is a vector such that when the map f ( x ) = Ax is applied, v is merely scaled The next gure shows two eigenvectors (blue arrows) and their images under A (red arrows) As expected, the image Av of each v is just a scaled version of the original

The eigenvalue equation is equivalent to ( A I )v = 0, and this has a nonzero solution v only when the columns of A I are linearly dependent This in turn is equivalent to stating that the determinant is zero Hence to nd all eigenvalues, we can look for such that the determinant of A I is zero This problem can be expressed as one of solving for the roots of a polynomial of degree n in This in turn implies the existence of n solutions in the complex plane, although some might be repeated Some nice facts about the eigenvalues of a square matrix A are as follows 1. The determinant of A is equal to the product of the eigenvalues 2. The trace of A (the sum of the elements on the principal diagonal) is equal to the sum of the eigenvalues 3. If A is symmetric, then all its eigenvalues are real

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.1. LINEAR ALGEBRA

167

4. If A is invertible and 1 , . . . , n are its eigenvalues, then the eigenvalues of A1 are 1/1 , . . . , 1/n A corollary of the rst statement is that a matrix is invertible if and only if all its eigenvalues are nonzero Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows
In [1]: import numpy as np In [2]: from scipy.linalg import eig In [3]: A = ((1, 2), ...: (2, 1)) In [4]: A = np.array(A) In [5]: evals, evecs = eig(A) In [6]: evals Out[6]: array([ 3.+0.j, -1.+0.j]) In [7]: evecs Out[7]: array([[ 0.70710678, -0.70710678], [ 0.70710678, 0.70710678]])

Note that the columns of evecs are the eigenvectors Since any scalar multiple of an eigenvector is an eigenvector with the same eigenvalue (check it), this routine normalizes the length of each eigenvector to one Generalized Eigenvalues It is sometimes useful to consider the generalized eigenvalue problem, which, for given matrices A and B, seeks generalized eigenvalues and eigenvectors v such that Av = Bv This can be solved in SciPy via scipy.linalg.eig(A, B) Of course if B is square and invertible, then we can treat the generalized eigenvalue problem as an ordinary eigenvalue problem B1 Av = v, but this is not always the case

Further Topics
We round out our discussion by briey mentioning several other important topics Series Expansions Recall the usual summation formula for a geometric progression, which k 1 states that if | a| < 1, then k =0 a = ( 1 a ) A generalization of this idea exists in the matrix setting

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.1. LINEAR ALGEBRA

168

To state it, let A be a square matrix, and let A := max Ax


x =1

The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand side is a matrix norm in this case, the so-called spectral norm For example, if A < 1, then A is contractive in the sense that it pulls points on the surface of the unit sphere (and hence everywhere else) towards the origin In addition, let Ak := AAk1 with A1 := A, so that Ak is the k-th power of A The Neumann theorem states the following: If A < 1, then I A is invertible, and

( I A ) 1 =

k =0

Ak

Positive Denite Matrices Let A be an n n matrix We say that A is 2. positive semi-denite or nonnegative denite if x Ax 0 for every x Rn Analogous denitions exist for negative denite and negative semi-denite matrices It is notable that if A is positive denite, then all its eigenvalues are strictly positive, and hence A is invertible (with positive denite inverse) Differentiating Linear and Quadratic forms The following formulas are useful in many economic contexts. Let z, x and a all be n 1 vectors A be an n n matrix B be an m n matrix and y be an m 1 vector Then 1. 2. 3. 4. 5.
a x x Ax x

1. positive denite if x Ax > 0 for every x Rn \ {0}

=a =A = ( A + A )x = Bz = yz

x Ax x y Bz y y Bz B

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.1. LINEAR ALGEBRA

169

An Example Let x be a given n 1 vector and consider the problem v( x ) = max y Py u Qu


y,u

subject to the linear constraint y = Ax + Bu Here P is an n n matrix and Q is an m m matrix A is an n n matrix and B is an n m matrix both P and Q are symmetric and positive semidenite Question: what must the dimensions of y and u be to make this a well-posed problem? One way to solve the problem is to form the Lagrangian

L = y Py u Qu + [ Ax + Bu y]
where is an n 1 vector of Lagrange multipliers Try applying the above formulas on differentiating quadratic and linear forms to obtain the rstorder conditions for maximizing L with respect to y, u and minimizing it with respect to Show that these conditions imply that 1. = 2 Py 2. The optimizing choice of u satises u = ( Q + B PB)1 B PAx where P = A PA A PB( Q + B PB)1 B PA 3. The function v satises v( x ) = x Px As we will see, in economic contexts the vector of Lagrange multipliers often has an interpretation as shadow prices Note: If we dont care about the Lagrange multipliers can subsitute the constraint into the objective function, and then just maximize ( Ax + Bu) P( Ax + Bu) u Qu with respect to u. You can verify that this leads to the same maximizer.

Further Reading The documentation of the scipy.linalg submodule can be found here Chapter 2 of these notes contains a discussion of linear algebra along the same lines as above, with solved exercises If you dont mind a slightly abstract approach, a nice intermediate-level read on linear algebra is [Janich1994]

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.2. FINITE MARKOV CHAINS

170

4.2 Finite Markov Chains


Overview
Markov chains are one of the most fundamental classes of stochastic processes Attributes: simple, exible and supported by many elegant theoretical results valuable for building intuition about random dynamic models very useful in their own right You will nd them in many of the workhorse models of economics and nance In this lecture we review some of the theory of Markov chains, with a focus on numerical methods Prerequisite knowledge is basic probability and linear algebra

Denitions
The following concepts are fundamental Stochastic Matrices A stochastic matrix (or Markov matrix) is an n n square matrix P = P[i, j] such that 1. each element P[i, j] is nonnegative, and 2. each row P[i, ] sums to one Let S := {0, . . . , n 1} Evidently, each row P[i, ] can be regarded as a distribution (probability mass function) on S It is not difcult to check 2 that if P is a stochastic matrix, then so is the n-th power Pn for all n N Markov Chains A stochastic matrix describes the dynamics of a Markov chain { Xt } that takes values in the state space S Formally, we say that a discrete time stochastic process { Xt } taking values in S is a Markov chain with stochastic matrix P if P { X t +1 = j | X t = i } = P [ i , j ] for any t 0 and i, j S; here P means probability Remark: A stochastic process { Xt } is said to have the Markov property if P { X t +1 | X t } = P { X t +1 | X t , X t 1 , . . . } so that the state Xt is a complete description of the current position of the system
Hint: First show that if P and Q are stochastic matrices then so is their product to check the row sums, try postmultiplying by a column vector of ones. Finally, argue that Pn is a stochastic matrix using induction.
2

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.2. FINITE MARKOV CHAINS

171

Thus, by construction, P[i, j] is the probability of going from i to j in one unit of time (one step) P[i, ] is the conditional distribution of Xt+1 given Xt = i Another way to think about this process is to imagine that, when Xt = i, the next value Xt+1 is drawn from the i-th row P[i, ] Rephrasing this using more algorithmic language At each t, the new state Xt+1 is drawn from P[ Xt , ] Example 1 Consider a worker who, at any given time t, is either unemployed (state 0) or employed (state 1) Lets write this mathematically as Xt = 0 or Xt = 1 Suppose that, over a one month period, 1. An employed worker loses her job and becomes unemployed with probability (0, 1) 2. An unemployed worker nds a job with probability (0, 1) In terms of a stochastic matrix, this tells us that P[0, 1] = and P[1, 0] = , or P= 1 1

Once we have the values and , we can address a range of questions, such as What is the average duration of unemployment? Over the long-run, what fraction of time does a worker nd herself unemployed? Conditional on employment, what is the probability of becoming unemployed at least once over the next 12 months? Etc. Well cover such applications below Example 2 Using US unemployment data, [Hamilton2005] estimated the stochastic matrix 0.971 0.029 0 P := 0.145 0.778 0.077 0 0.508 0.492 where the frequency is monthly state 0 represents normal growth state 1 represents mild recession state 2 represents severe recession T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.2. FINITE MARKOV CHAINS

172

For example, the matrix tells us that when the state is normal growth, the state will again be normal growth next month with probability 0.97 In general, large values on the main diagonal indicate persistence in the process { Xt } This Markov process can also be represented as a directed graph, with edges labeled by transition probabilities

Here nr is normal growth, mr is mild recession, etc. This kind of visual presentation helps to build intuition for processes with a small number of states

Simulation
One of the most natural ways to answer questions about Markov chains is to simulate them (As usual, to approximate the probability of event E, we can simulate many times and count the fraction of times that E occurs) To simulate a Markov chain, we need its stochastic matrix P and an initial probability distribution Here is a probability distribution on S with the interpretation that X0 is drawn from The Markov chain is then constructed via the following two rules 1. At time t = 0, the initial state X0 is drawn from 2. At each subsequent time t, the new state Xt+1 is drawn from P[ Xt , ] In order to implement this simulation procedure, we need a function for generating draws from a given discrete distribution We already have this functionality in handin the discrete_rv module we wrote in this exercise The module can be found in the main repository, and denes a class discreteRV that can be used as follows
In [64]: run discrete_rv.py In [65]: psi = (0.1, 0.9) In [66]: d = discreteRV(psi) In [67]: d.draw(5) Out[67]: array([0, 1, 1, 1, 1])

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.2. FINITE MARKOV CHAINS

173

Here psi is understood to be a discrete distribution on the set of outcomes 0, ..., len(psi) 1 d.draw(5) generates 5 independent draws from this distribution Lets now write a function that generates time series from a specied pair P, Our function will take the following three arguments A stochastic matrix P, An initial state or distribution init A positive integer sample_size representing the length of the time series the function should return Lets allow init to either be an integer in 0, . . . , n 1 providing a xed starting value for X0 , or a discrete distribution on this same set that corresponds to the initial distribution In the latter case, a random starting value for X0 is drawn from the distribution init The function should return a time series (sample path) of length sample_size Heres one solution to this problem, in le mc_tools.py from the main repository
def sample_path(P, init=0, sample_size=1000): """ Generates one sample path from a finite Markov chain with (n x n) Markov matrix P on state space S = {0,...,n-1}. Parameters: * P is a nonnegative 2D NumPy array with rows that sum to 1 * init is either an integer in S or a nonnegative array of length n with elements that sum to 1 * sample_size is an integer If init is an integer, the integer is treated as the determinstic initial condition. If init is a distribution on S, then X_0 is drawn from this distribution. Returns: A NumPy array containing the sample path """ # === set up array to store output === # X = np.empty(sample_size, dtype=int) if isinstance(init, int): X[0] = init else: X[0] = discreteRV(init).draw() # === turn each row into a distribution === # # In particular, let P_dist[i] be the distribution corresponding to the

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.2. FINITE MARKOV CHAINS

174

# i-th row P[i,:] n = len(P) P_dist = [discreteRV(P[i,:]) for i in range(n)] # === generate the sample path === # for t in range(sample_size - 1): X[t+1] = P_dist[X[t]].draw() return X

To test our solution, lets use the small matrix P := That is,
P = np.array([[.4, .6], [.2, .8]])

0.4 0.6 0.2 0.8

It happens to be true that, for a long series drawn from P, the fraction of the sample that takes value 0 will be about 0.25 well see why later on
In [86]: run mc_tools.py In [87]: P = np.array([[.4, .6], [.2, .8]]) In [88]: s = sample_path(P, init=(0.5, 0.5), sample_size=100000) In [89]: (s == 0).mean() Out[89]: 0.24975 # Should be about 0.25

A nal comment on the function sample_path is that the code is not particularly fast, mainly because we are using explicit looping Some justication is provided by the following mantra We should forget about small efciencies, say about 97% of the time: premature optimization is the root of all evil. Donald Knuth In addition, removing the explicit loop is a non-trivial problem we will come back to it later on

Marginal Distributions
Suppose that 1. { Xt } is a Markov chain with stochastic matrix P 2. the distribution of Xt is known to be t What then is the distribution of Xt+1 , or, more generally, of Xt+m ? (Motivation for this problem is given below)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.2. FINITE MARKOV CHAINS

175

Solution Lets consider how to solve for the distribution t+m of Xt+m , beginning with the case m=1 Throughout, t will refer to the distribution of Xt for all t Hence our rst aim is to nd t+1 given t and P To begin, pick any j S. Using the law of total probability, we can decompose the probability that Xt+1 = j as follows: P { X t +1 = j } =
i S

P { X t +1 = j | X t = i } P { X t = i }

(In words, to get the probability of being at j tomorrow, we account fo all ways this can happen and sum their probabilities) Rewriting this statement in terms of marginal and conditional probabilities gives t+1 [ j] = There are n such equations, one for each j S If we think of t+1 and t as row vectors, these n equations are summarized by the matrix expression t+1 = t P In other words, to move the distribution forward one unit of time, we postmultiply by P By repeating this m times we move forward m steps into the future Hence t+m = t Pm is also valid here Pm is the m-th power of P As a special case, we see that if 0 is the initial distribution from which X0 is drawn, then 0 Pm is the distribution of Xm Important convention: In the Markov chain literature, distributions are row vectors unless stated otherwise Example: Future Probabilities Recall the stochastic matrix P for recession and growth considered above Suppose that the current state is unknown perhaps statistics are available only at the end of the current month We estimate the probability that the economy is in state i to be [i ] The probability of being in recession (state 1 or state 2) in 6 months time is given by the inner product 0 P6 1 1
i S

P[i, j]t [i]

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.2. FINITE MARKOV CHAINS

176

Example 2: Cross-Sectional Distributions Recall our model of employment / unemployment dynamics for a given worker discussed above Consider a large (i.e., tending to innite) population of workers, each of whose lifetime experiences are described by the specied dynamics, independently of one another Let be the current cross-sectional distribution over {0, 1} For example, [0] is the unemployment rate The cross-sectional destribution records the fractions of workers employed and unemployed at a given moment The same distribution also describes the fractions of a particular workers career spent being employed and unemployed, respectively

Stationary Distributions
As stated in the previous section, we can shift probabilities forward one unit of time via postmultiplication by P Some distributions are invariant under this updating process for example,
In [2]: P = np.array([[.4, .6], [.2, .8]]) In [3]: psi = (0.25, 0.75) In [4]: np.dot(psi, P) Out[4]: array([ 0.25, 0.75])

Such distributions are called stationary, or invariant Formally, a distribution on S is called stationary for P if = P From this equality we immediately get = Pt for all t This tells us an important fact: If the distribution of X0 is a stationary distribution, then Xt will have this same distribution for all t Hence stationary distributions have a natural interpretation as stochastic steady states well discuss this more in just a moment Mathematically, a stationary distribution is just a xed point of P when P is thought of as the map P from (row) vectors to (row) vectors At least one such distribution exists for each stochastic matrix P apply Brouwers xed point theorem, or see EDTC, theorem 4.3.5 There may in fact be many stationary distributions corresponding to a given stochastic matrix P For example, if P is the identity matrix, then all distributions are stationary One sufcient condition for uniqueness is uniform ergodicity: Def. Stochastic matrix P is called uniformly ergodic if there exists a positive integer m such that all elements of Pm are strictly positive

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.2. FINITE MARKOV CHAINS

177

For further details on uniqueness and uniform ergodicity, see, for example, EDTC, theorem 4.3.18 Example Recall our model of employment / unemployment dynamics for a given worker discussed above Assuming (0, 1) and (0, 1), the uniform ergodicity condition is satised Let = ( p, 1 p) be the stationary distribution, so that p corresponds to unemployment (state 0) Using = P and a bit of algebra yields p= +

This is, in some sense, a steady state probability of unemployment more on interpretation below Not surprisingly it tends to zero as 0, and to one as 0 Stationary Distributions by Matrix Inversion Lets suppose that P has a unique stationary distribution , and consider how to calculate it From the denition of stationarity, one option would be to solve the linear system ( In P) = 0 for , where In is the n n identity But this does not impose the restriction that the solution is a probability distribution for example, the zero vector solves ( In P) = 0 The restriction that is a probability distribution can be imposed by working instead with the system ( In P + B) = b (4.4) Here is the unknown (column) vector, B is an n n matrix of ones and b is a column vector of ones You can verify that if solves this system then its elements must necessarily sum to one 3 Heres a function that takes P as a parameter and returns the stationary distribution using this technique it can be found in the le mc_tools.py in the main repository
import numpy as np from discrete_rv import discreteRV def compute_stationary(P): """ Computes the stationary distribution of Markov matrix P. Parameters: * P is a square 2D NumPy array Returns: A flat array giving the stationary distribution
3

Hint: Premultiply be a row vector of ones.

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.2. FINITE MARKOV CHAINS

178

""" n = len(P) I = np.identity(n) B, b = np.ones((n, n)), np.ones((n, 1)) A = np.transpose(I - P + B) solution = np.linalg.solve(A, b) return solution.flatten()

# P is n x n # Identity matrix # Matrix and vector of ones # Return a flat array

Lets test it using the matrix P = np.array([[.4, .6], [.2, .8]]), the unique stationary distribution of which is (0.25, 0.75)
In [10]: run mc_tools.py In [11]: P = np.array([[.4, .6], [.2, .8]]) In [12]: compute_stationary(P) Out[12]: array([ 0.25, 0.75])

Convergence to Stationarity Let P be a stochastic matrix such that the uniform ergodicity assumption is valid We know that under this condition there is a unique stationary distribution In fact, under the same condition, we have another important result: for any nonnegative row vector summing to one (i.e., distribution), Pt as t (4.5)

In view of our preceding discussion, this states that the distribution of Xt converges to , regardless of the distribution of X0 This adds considerable weight to our interpretation of as a stochastic steady state For one of several well-known proofs, see EDTC, theorem 4.3.18 The convergence in (4.5) is illustrated in the next gure Here P is the stochastic matrix for recession and growth considered above The highest red dot is an arbitrarily chosen initial probability distribution , represented as a vector in R3 The other red dots are the distributions Pt for t = 1, 2, . . . The black dot is The code for the gure can be found in the le mc_convergence_plot.py in the main repository you might like to try experimenting with different initial conditions

Ergodicity
Under the very same condition of uniform ergodicity, yet another important result obtains: If T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.2. FINITE MARKOV CHAINS

179

1. { Xt } is a Markov chain with stochastic matrix P 2. P is uniformly ergodic with stationary distribution then j S 1 n 1 { Xt = j } [ j ] n t =1 Here 1{ Xt = j} = 1 if Xt = j and zero otherwise convergence is with probability one the result does not depend on the distribution (or value) of X0 The result tells us that the fraction of time the chain spends at state j converges to [ j] as time goes to innity This gives us another way to interpret the stationary distribution provided that the convergence result in (4.6) is valid Technically, the convergence in (4.6) is a special case of a law of large numbers result for Markov chains see EDTC, section 4.3.4 for details Example Recall our cross-sectional interpretation of the employment / unemployment model discussed above Assume that (0, 1) and (0, 1), so the uniform ergodicity condition is satised We saw that the stationary distribution is ( p, 1 p), where p= + February 5, 2014 as n (4.6)

T HOMAS S ARGENT AND J OHN S TACHURSKI

4.2. FINITE MARKOV CHAINS

180

In the cross-sectional interpretation, this is the fraction of people unemployed In view of our latest (ergodicity) result, it is also the fraction of time that a worker can expect to spend unemployed Thus, in the long-run, cross-sectional averages for a population and time-series averages for a given person coincide This is one interpretation of the notion of ergodicity

Exercises
Exercise 1 According to the discussion immediately above, if a workers employment dynamics obey the stochastic matrix 1 P= 1 with (0, 1) and (0, 1), then, in the long-run, the fraction of time spent unemployed will be p := +

n p as n , In other words, if { Xt } is represents the Markov chain for employment, then X where n n : = 1 1 { Xt = 0} X n t =1 Your exercise is to illustrate this convergence First, generate one simulated time series { Xt } of length 10,000, starting at X0 = 0 n p against n, where p is as dened above plot X Second, repeat the rst step, but this time taking X0 = 1 In both cases, set = = 0.1 The result should look something like the following modulo randomness, of course (You dont need to add the fancy touches to the graphsee the solution if youre interested) Solution: View solution Exercise 2 A topic of interest for economics and many other disciplines is ranking Lets now consider one of the most practical and important ranking problems the rank assigned to web pages by search engines (Although the problem is motivated from outside of economics, there is in fact a deep connection between search ranking systems and prices in certain competitive equilibria see [DLP2012]) To understand the issue, consider the set of results returned by a query to a web search engine For the user, it is desirable to T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.2. FINITE MARKOV CHAINS

181

1. receive a large set of accurate matches 2. have the matches returned in order, where the order corresponds to some measure of importance Ranking according to a measure of importance is the problem we now consider The methodology developed to solve this problem by Google founders Larry Page and Sergey Brin is known as PageRank To illustrate the idea, consider the following diagram

Imagine that this is a miniature version of the WWW, with each node representing a web page each arrow representing the existence of a link from one page to another Now lets think about which pages are likely to be important, in the sense of being valuable to a search engine user T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.2. FINITE MARKOV CHAINS

182

One possible criterion for importance of a page is the number of inbound links an indication of popularity By this measure, m and j are the most important pages, with 5 inbound links each However, what if the pages linking to m, say, are not themselves important? Thinking this way, it seems appropriate to weight the inbound nodes by relative importance The PageRank algorithm does precisely this A slightly simplied presentation that captures the basic idea is as follows Letting j be (the integer index of) a typical page and r j be its ranking, we set rj = where
i

i L j

ri
i

is the total number of outbound links from i

L j is the set of all pages i such that i has a link to j This is a measure of the number of inbound links, weighted by their own ranking (and normalized by 1/ i ) There is, however, another interpretation, and it brings us back to Markov chains Let P be the matrix given by P[i, j] = 1{i j}/ otherwise
i

where 1{i j} = 1 if i has a link to j and zero

The matrix P is a stochastic matrix provided that each page has at least one link With this denition of P we have rj =
i L j

ri
i

all i

1 {i j }

ri
i

all i

P [i , j ]ri

Writing r for the row vector of rankings, this becomes r = rP Hence r is the stationary distribution of the stochastic matrix P Lets think of P[i, j] as the probability of moving from page i to page j The value P[i, j] has the interpretation P[i, j] = 1/k if i has k outbound links, and j is one of them P[i, j] = 0 if i has no direct link to j Thus, motion from page to page is that of a web surfer who moves from one page to another by randomly clicking on one of the links on that page Here random means that each link is selected with equal probability Since r is the stationary distribution of P, assuming that the uniform ergodicity condition is valid, we can interpret r j as the fraction of time that a (very persistent) random surfer spends at page j

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.2. FINITE MARKOV CHAINS

183

Your exercise is to apply this ranking algorithm to the graph pictured above, and return the list of pages ordered by rank The data for this graph is in the web_graph_data.txt le from the main repository you can also view it here There is a total of 14 nodes (i.e., web pages), the rst named a and the last named n A typical line from the le has the form
d -> h;

This should be interpreted as meaning that there exists a link from d to h To parse this le and extract the relevant information, you can use regular expressions The following code snippet provides a hint as to how you can go about this
In [1]: import re In [2]: re.findall('\w', 'x +++ y ****** z') Out[2]: ['x', 'y', 'z'] In [3]: re.findall('\w', 'a ^^ b &&& $$ c') Out[3]: ['a', 'b', 'c'] # \w matches alphanumerics

When you solve for the ranking, you will nd that the highest ranked node is in fact g, while the lowest is a Solution: View solution Exercise 3 In numerical work it is sometimes convenient to replace a continuous model with a discrete one In particular, Markov chains are routinely generated as discrete approximations to AR(1) processes of the form y t +1 = y t + u t +1
2) Here ut is assumed to be iid and N (0, u

The variance of the stationary probability distribution of {yt } is


2 y := 2 u 1 2

Tauchens method [Tauchen1986] is the most common method for approximating this continuous state process with a nite state Markov chain As a rst step we choose n, the number of states for the discrete approximation m, an integer that parameterizes the width of the state space Next we create a state space { x0 , . . . , xn1 } R and a stochastic n n matrix P such that x0 = m y T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.3. SHORTEST PATHS

184

xn1 = m y xi+1 = xi + s where s = ( xn1 x0 )/(n 1) P[i, j] represents the probability of transitioning from xi to x j
2) Let F be the cumulative distribution function of the normal distribution N (0, u

The values P[i, j] are computed to approximate the AR(1) process omitting the derivation, the rules are as follows: 1. If j = 0, then set P[i, j] = P[i, 0] = F ( x0 xi + s/2) 2. If j = n 1, then set P[i, j] = P[i, n 1] = 1 F ( xn1 xi s/2) 3. Otherwise, set P[i, j] = F ( x j xi + s/2) F ( x j xi s/2) The exercise is to write a function approx_markov(rho, sigma_u, m=3, n=7) that returns { x0 , . . . , xn1 } R and n n matrix P as described above Solution: View solution

4.3 Shortest Paths


Overview
The shortest path problem is a classic problem in mathematics and computer science with applications in Economics (sequential decision making, analysis of social networks, etc.) Operations research and transportation Robotics and articial intelligence Telecommunication network design and routing Etc., etc. For us, the shortest path problem also provides a simple introduction to the logic of dynamic programming, which is one of our key topics Variations of the methods we discuss are used millions of times every day, in applications such as Google Maps

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.3. SHORTEST PATHS

185

Outline of the Problem


The shortest path problem is one of nding how to traverse a graph from one specied node to another at minimum cost Consider the following graph

We wish to travel from node (vertex) A to node G at minimum cost Arrows (edges) indicate the movements we can take Numbers next to edges indicate the cost of traveling that edge Possible interpretations of the graph include Minimum cost for supplier to reach a destination Routing of packets on the internet (minimize time) Etc., etc. For this simple graph, a quick scan of the edges shows that the optimal paths are A, C, F, G at cost 8 A, D, F, G at cost 8

Finding Least-Cost Paths


For large graphs we need a systematic solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.3. SHORTEST PATHS

186

Let J (v) denote the minimum cost-to-go from node v, understood as the total cost from v if we take the best route Suppose that we know J (v) for each node v, as shown below for the graph from the preceding example

Note that J ( G ) = 0 Intuitively, the best path can now be found as follows Start at A From node v, move to any node that solves min{c(v, w) + J (w)}
w Fv

(4.7)

where Fv is the set of nodes that can be reached from v in one step c(v, w) is the cost of traveling from v to w Hence, if we know the function J , then nding the best path is almost trivial But how to nd J ? Some thought will convince you that, for every node v, the function J satises J (v) = min{c(v, w) + J (w)}
w Fv

(4.8)

This is known as the Bellman equation That is, J is the solution to the Bellman equation T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.4. SCHELLINGS SEGREGATION MODEL

187

There are algorithms for computing the minimum cost-to-go function :math: J

Solving for J
The standard algorithm for nding J is to start with J0 (v) = M if v = destination, else J0 (v) = 0 where M is some large number Now we use the following algorithm 1. Set n = 0 2. Set Jn+1 (v) = minw Fv {c(v, w) + Jn (w)} for all v 3. If Jn+1 and Jn are not equal then increment n, go to 2 In general, this sequence converges to J the proof is omitted (4.9)

Exercises
Exercise 1 Use the algorithm given above to nd the optimal path (and its cost) for this graph Here the line node0, node1 0.04, node8 11.11, node14 72.21 means that from node0 we can go to node1 at cost 0.04 node8 at cost 11.11 node14 at cost 72.21 and so on According to our calucations, the optimal path and its cost are like this Your code should replicate this result Solution: View solution

4.4 Schelling's Segregation Model


Outline
In 1969, Thomas C. Schelling developed a simple but striking model of racial segregation [Schelling1969] His model studies the dynamics of racially mixed neighborhoods Like much of Schellings work, the model shows how local interactions can lead to surprising aggregate structure T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.4. SCHELLINGS SEGREGATION MODEL

188

In particular, it shows that relatively mild preference for neighbors of similar race can lead in aggregate to the collapse of mixed neighborhoods, and high levels of segregation In recognition of this and other research, Schelling was awarded the 2005 Nobel Prize in Economic Sciences (joint with Robert Aumann) In this lecture we (in fact you) will build and run a version of Schellings model

The Model
We will cover a variation of Schellings model that is easy to program and captures main idea Set Up Suppose we have two types of people: Orange people and green people For the purpose of this lecture, we will assume there are 250 of each type These agents all live on a single unit square The location of an agent is just a point ( x, y), where 0 < x, y < 1 Preferences We will say that an agent is happy if half or more of her 10 nearest neighbors are of the same type Here nearest is in terms of Euclidean distance An agent who is not happy is called unhappy An important point here is that agents are not averse to living in mixed areas They are perfectly happy if half their neighbors are of the other color Behavior Initially, agents are mixed together (integrated) In particular, the initial location of each agent is an independent draw from a bivariate uniform distribution on S = (0, 1)2 Now, cycling through the set of all agents, each agent is now given the chance to stay or move We assume that each agent will stay put if they are happy and move if unhappy The algorithm for moving is as follows 1. Draw a random location in S 2. If happy at new location, move there 3. Else, go to step 1 In this way, we cycle continuously through the agents, moving as required We continue to cycle until no-one wishes to move

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.4. SCHELLINGS SEGREGATION MODEL

189

Results
Lets have a look at the results we got when we coded and ran this model As discussed above, agents are initially mixed randomly together

But after several cycles they become segregated into distinct regions In this instance, the program terminated after 4 cycles through the set of agents, indicating that all agents had reached a state of happiness What is striking about the pictures is how rapidly racial integration breaks down This is despite the fact that people in the model dont actually mind living mixed with the other type Even with these preferences, the outcome is a high degree of segregation

Exercises
Rather than show you the program that generated these gures, well now ask you to write your own version You can see our program at the end, when you look at the solution Exercise 1 Implement and run this simulation for yourself Consider the following structure for your program Agents are modeled as objects

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.4. SCHELLINGS SEGREGATION MODEL

190

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.4. SCHELLINGS SEGREGATION MODEL

191

(Have a look at this lecture if youve forgotten how to build your own objects) Heres an indication of how they might look
* Data: * type (green or orange) * location * Methods: * Determine whether happy or not given locations of other agents * If not happy, move * find a new location where happy

And heres some pseudocode for the main loop


while agents are still moving: for agent in agents: give agent the opportunity to move

Use 250 agents of each type Solution: View solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.5. LLN AND CLT

192

4.5 LLN and CLT


Overview
This lecture illustrates two of the most important theorems of probability and statistics: The law of large numbers (LLN) and the central limit theorem (CLT) These beautiful theorems lie behind many of the most fundamental results in econometrics and quantitative economic modeling The lecture is based around simulations that show the LLN and CLT in action We also demonstrate how the LLN and CLT break down when the assumptions they are based on do not hold In addition, we examine several useful extensions of the classical theorems, such as The delta method, for smooth functions of random variables The multivariate case Some of these extensions are presented as exercises

Relationships
The CLT renes the LLN The LLN gives conditions under which sample moments converge to population moments as sample size increases The CLT provides information about the rate at which sample moments converge to population moments as sample size increases

LLN
We begin with the law of large numbers, which tells us when sample averages will converge to their population means The Classical LLN The classical law of large numbers concerns independent and identically distributed (IID) random variables Here is the strongest version of the classical LLN, known as Kolmogorovs strong law Let X1 , . . . , Xn be independent and identically distributed scalar random variables, with common distribution F When it exists, let denote the common mean of this sample: := EX = xF (dx )

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.5. LLN AND CLT

193

In addition, let

n n : = 1 Xi X n i =1

Kolmogorovs strong law states that, if E| X | is nite, then n as n } = 1 P {X What does this last expression mean? Lets think about it from a simulation perspective, imagining for a moment that our computer can generate perfect random samples (which of course it cant) n can be Lets also imagine that we can generate innite sequences, so that the statement X evaluated In this setting, (4.10) should be interpreted as meaning that the probability of the computer pro n fails to occur is zero ducing a sequence where X Proof The proof of Kolmogorovs strong law is nontrivial see, for example, theorem 8.3.5 of [Dudley2002] On the other hand, we can prove a weaker version of the LLN very easily and still get most of the intuition The version we prove is as follows: If X1 , . . . , Xn is IID with EXi2 < , then, for any have n | } 0 as n P {| X (4.10)

> 0, we
(4.11)

(This version is weaker because we claim only convergence in probability rather than almost sure convergence, and assume a nite second moment) To see that this is so, x

> 0, and let 2 be the variance of each Xi

Recall the Chebyshev inequality, which tells us that


2 n | } E[( Xn ) ] P {| X 2

(4.12)

Now observe that 1 n n )2 ] = E E[( X ( Xi ) n i =1


2

= =

1 n2 1 n2

i =1 j =1 n

E(Xi )(Xj ) E ( Xi ) 2

i =1

2 = n Here the crucial step is at the third equality, which follows from independence T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.5. LLN AND CLT

194

Independence means that if i = j, then the covariance term E( Xi )( X j ) drops out As a result, n2 n terms vanish, leading us to a nal expression that goes to zero in n Combining our last result with (4.12), we come to the estimate n | } P {| X The claim in (4.11) is now clear Of course, if the sequence X1 , . . . , Xn is correlated, then the cross-product terms E( Xi )( X j ) are not necessarily zero While this doesnt mean that the same line of argument is impossible, it does mean that if we want a similar result then the covariances should be almost zero for most of these terms In a long sequence, this would be true if, for example, E( Xi )( X j ) approached zero when the difference between i and j became large In other words, the LLN can still work if the sequence X1 , . . . , Xn has a kind of asymptotic independence, in the sense that correlation falls to zero as variables become further apart in the sequence This idea is very important in time series analysis, and well come across it again soon enough Illustration Lets now illustrate the classical IID law of large numbers using simulation In particular, we aim to generate some sequences of IID random variables and plot the evolution n as n increases of X Below is a gure that does just this (as usual, you can click on it to expand it) n against n in each case It shows IID observations from three different distributions and plots X The dots represent the underlying observations Xi for i = 1, . . . , 100 n to occurs as predicted In each of the three cases, convergence of X The gure was produced by illustrates_lln.py, which is shown below (and can be found in the main repository) The three distributions are chosen at random from a selection stored in the dictionary distributions
import random import numpy as np from scipy.stats import t, beta, lognorm, expon, gamma, poisson import matplotlib.pyplot as plt n = 100 # == Arbitrary collection of distributions == # distributions = {"student's t with 10 degrees of freedom" : t(10), "beta(2, 2)" : beta(2, 2), "lognormal LN(0, 1/2)" : lognorm(0.5),

2 n 2

(4.13)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.5. LLN AND CLT

195

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.5. LLN AND CLT

196
"gamma(5, 1/2)" : gamma(5, scale=2), "poisson(4)" : poisson(4), "exponential with lambda = 1" : expon(1)}

# == Create a figure and some axes == # num_plots = 3 fig, axes = plt.subplots(num_plots, 1, figsize=(10, 10)) # == Set some plotting parameters to improve layout == # bbox = (0., 1.02, 1., .102) legend_args = {'ncol' : 2, 'bbox_to_anchor' : bbox, 'loc' : 3, 'mode' : 'expand'} plt.subplots_adjust(hspace=0.5) for ax in axes: # == Choose a randomly selected distribution == # name = random.choice(distributions.keys()) distribution = distributions.pop(name) # == Generate n draws from the distribution == # data = distribution.rvs(n) # == Compute sample mean at each n == # sample_mean = np.empty(n) for i in range(n): sample_mean[i] = np.mean(data[:i]) # == Plot == # ax.plot(range(n), data, 'o', color='grey', alpha=0.5) axlabel = r'$\bar X_n$' + ' for ' + r'$X_i \sim$' + ' ' + name ax.plot(range(n), sample_mean, 'g-', lw=3, alpha=0.6, label=axlabel) m = distribution.mean() ax.plot(range(n), [m] * n, 'k--', lw=1.5, label=r'$\mu$') ax.vlines(range(n), m, data, lw=0.2) ax.legend(**legend_args) plt.show()

Innite Mean What happens if the condition E| X | < in the statement of the LLN is not satised? This might be the case if the underlying distribution is heavy tailed the best known example is the Cauchy distribution, which has density f (x) = 1 (1 + x 2 )

( x R)

The next gure shows 100 independent draws from this distribution Notice how extreme observations are far more prevalent here than the previous gure T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.5. LLN AND CLT

197

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.5. LLN AND CLT

198

Lets now have a look at the behavior of the sample mean Here weve increased n to 1000, but the sequence still shows no sign of converging Will convergence become visible if we take n even larger? The answer is no To see this, recall that the characteristic function of the Cauchy distribution is (t) = EeitX = eitx f ( x )dx = e|t| (4.14)

Using independence, the characteristic function of the sample mean becomes EeitXn = E exp

t n

j =1

Xj

n t = E exp i X j n j =1 n t = E exp i X j n j =1

= [(t/n)]n

In view of (4.14), this is just e|t| Thus, in the case of the Cauchy distribution, the sample mean itself has the very same Cauchy distribution, regardless of n In particular, the sequence does not converge to a point

CLT
Next we turn to the central limit theorem, which tells us about the distribution of the deviation between sample averages and population means Statement of the Theorem The central limit theorem is one of the most remarkable results in all of mathematics In the classical IID setting, it tells us the following: If the sequence X1 , . . . , Xn is IID, with common mean and common variance 2 (0, ), then

n ) N (0, 2 ) n( X

as

(4.15)

Here N (0, 2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal with standard deviation

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.5. LLN AND CLT

199

Intuition The striking implication of the CLT is that for any distribution with nite second moment, the simple operation of adding independent copies always leads to a Gaussian curve A relatively simple proof of the central limit theorem can be obtained by working with characteristic functions (see, e.g., theorem 9.5.6 of [Dudley2002]) The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition In fact all of the proofs of the CLT that we know share are similar in this respect Why does adding independent copies produce a bell-shaped distribution? Part of the answer can be obtained by investigating addition of independent Bernoulli random variables In particular, let Xi be binary, with P{ Xi = 0} = P{ Xi = 1} = 0.5, and let X1 , . . . , Xn be independent Think of Xi = 1 as a success, so that Yn = in=1 Xi is the number of successes in n trials The next gure plots the probability mass function of Yn for n = 1, 2, 4, 8

When n = 1, the distribution is at one success or no successes have the same probability When n = 2 we can either have 0, 1 or 2 successes Notice the peak in probability mass at the mid-point k = 1 The reason is that there are more ways to get 1 success (fail then succeed or succeed then fail) than to get zero or two successes Moreover, the two trials are independent, so the outcomes fail then succeed and succeed then fail are just as likely as the outcomes fail then fail and succeed then succeed

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.5. LLN AND CLT

200

(If there was positive correlation, say, then succeed then fail would be less likely than succeed then succeed) Here, already we have the essence of the CLT: addition under independence leads probability mass to pile up in the middle and thin out at the tails For n = 4 and n = 8 we again get a peak at the middle value (halfway between the minimum and the maximum possible value) The intuition is the same there are simply more ways to get these middle outcomes If we continue, the bell-shaped curve becomes ever more pronounced We are witnessing the binomial approximation of the normal distribution Simulation 1 Since the CLT seems almost magical, running simulations that verify its implications is one good way to build intuition To this end, we now perform the following simulation 1. Choose an arbitrary distribution F for the underlying observations Xi n ) 2. Generate independent draws of Yn := n( X 3. Use these draws to compute some measure of their distribution such as a histogram 4. Compare the latter to N (0, 2 ) Heres some code that does exactly this for the exponential distribution F ( x ) = 1 e x (Please experiment with other choices of F, but remember that, to conform with the conditions of the CLT, the distribution must have nite second moment)
import numpy as np from scipy.stats import expon, norm, poisson import matplotlib.pyplot as plt from matplotlib import rc # == Specifying font, needs LaTeX integration == # rc('font',**{'family':'serif','serif':['Palatino']}) rc('text', usetex=True) # == Set parameters == # n = 250 # Choice of n k = 100000 # Number of draws of Y_n distribution = expon(2) # Exponential distribution, lambda = 1/2 mu, s = distribution.mean(), distribution.std() # == Draw underlying RVs. Each row contains a draw of X_1,..,X_n == # data = distribution.rvs((k, n)) # == Compute mean of each row, producing k draws of \bar X_n == # sample_means = data.mean(axis=1) # == Generate observations of Y_n == # Y = np.sqrt(n) * (sample_means - mu) # == Plot == #

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.5. LLN AND CLT

201

fig, ax = plt.subplots() xmin, xmax = -3 * s, 3 * s ax.set_xlim(xmin, xmax) ax.hist(Y, bins=60, alpha=0.5, normed=True) xgrid = np.linspace(xmin, xmax, 200) ax.plot(xgrid, norm.pdf(xgrid, scale=s), 'k-', lw=2, label=r'$N(0, \sigma^2)$') ax.legend() plt.show()

The le is illustrates_clt.py, from the main repository Notice the absence of for loops every operation is vectorized, meaning that the major calculations are all shifted to highly optimized C code The program produces gures such as the one below

The t to the normal density is already tight, and can be further improved by increasing n You can also experiment with other specications of F Note: You might need to delete or modify the lines beginning with rc to get this code to run on your computer

Simulation 2 Our next simulation is somewhat like the rst, except that we aim to track the n ) as n increases distribution of Yn := n( X In the simulation well be working with random variables having = 0 Thus, when n = 1, we have Y1 = X1 , so the rst distribution is just the distribution of the underlying random variable For n = 2, the distribution of Y2 is that of ( X1 + X2 )/ 2, and so on T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.5. LLN AND CLT

202

What we expect is that, regardless of the distribution of the underlying random variable, the distribution of Yn will smooth out into a bell shaped curve The next gure shows this process for Xi f , where f was specied as the convex combination of three different beta densities (Taking a convex combination is an easy way to produce an irregular shape for f ) In the gure, the closest density is that of Y1 , while the furthest is that of Y5

As expected, the distribution smooths out into a bell curve as n increases The gure is generated by le clt3d.py, which is available from the main repository We leave you to investigate its contents if you wish to know more If you run the le from the ordinary IPython shell, the gure should pop up in a window that you can rotate with your mouse, giving different views on the density sequence The Multivariate Case The law of large numbers and central limit theorem work just as nicely in multidimensional settings To state the results, lets recall some elementary facts about random vectors A random vector X is just a sequence of k random variables ( X1 , . . . , Xk ) Each realization of X is an element of Rk A collection of random vectors X1 , . . . , Xn is called independent if, given any n vectors x1 , . . . , xn in Rk , we have P{ X1 x1 , . . . , X n x n } = P{ X1 x1 } P{ X n x n }

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.5. LLN AND CLT

203

(The vector inequality X x means that X j x j for j = 1, . . . , k) Let j := E[ X j ] for all j = 1, . . . , k The expectation E[X] of X is dened to be the vector of expectations: 1 E [ X1 ] E [ X2 ] 2 E[ X ] : = =: = . . . . . . E[ Xk ] k The variance-covariance matrix of random vector X is dened as Var[X] := E[(X )(X ) ] Expanding this out, we get Var[X] =

E[( X1 1 )( X1 1 )] E[( X1 1 )( Xk k )] E[( X2 2 )( X1 1 )] E[( X2 2 )( Xk k )] . . . . . . . . . E[( Xk k )( X1 1 )] E[( Xk k )( Xk k )]

The j, k-th term is the scalar covariance between X j and Xk With this notation we can proceed to the multivariate LLN and CLT Let X1 , . . . , Xn be a sequence of independent and identically distributed random vectors, each one taking values in Rk Let be the vector E[Xi ], and let be the variance-covariance matrix of Xi Interpreting vector addition and scalar multiplication in the usual way (i.e., pointwise), let
n n : = 1 Xi X n i =1

In this setting, the LLN tells us that n as n } = 1 P {X n means that X n 0, where Here X The CLT tells us that, provided is nite, (4.16)

is the standard Euclidean norm

n ) N ( 0, ) n(X

as

(4.17)

Exercises
Exercise 1 One very useful consequence of the central limit theorem is as follows Assume the conditions of the CLT as stated above

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.5. LLN AND CLT

204

If g : R R is differentiable at and g () = 0, then

d n ) g()} n{ g( X N (0, g ()2 2 )

as

(4.18)

This theorem is used frequently in statistics to obtain the asymptotic distribution of estimators many of which can be expressed as functions of sample means (These kinds of results are often said to use the delta method) The proof is based on a Taylor expansion of g around the point Taking the result as given, let the distribution F of each Xi be uniform on [0, /2] and let g( x ) = sin( x ) n ) g()} and illustrate convergence in the same Derive the asymptotic distribution of n{ g( X spirit as the program illustrate_clt.py discussed above What happens when you replace [0, /2] with [0, ]? What is the source of the problem? Solution: View solution Exercise 2 Heres a result thats often used in developing statistical tests, and is connected to the multivariate central limit theorem If you study econometric theory, you will see this result used again and again Assume the setting of the multivariate CLT discussed above, so that 1. X1 , . . . , Xn is a sequence of IID random vectors, each taking values in Rk 2. := E[Xi ], and is the variance-covariance matrix of Xi 3. The convergence

is valid

d n ) n(X N ( 0, )

(4.19)

It a statistical setting, one often wants the right hand side to be standard normal, so that condence intervals are easily computed This normalization can be achieved on the basis of three observations First, if X is a random vector in Rk and A is constant and k k, then Var[AX] = A Var[X]A Second, by the continuous mapping theorem, if Zn Z in Rk and A is constant and k k, then AZn AZ Third, if S is a k k symmetric positive denite matrix, then there exists a symmetric positive denite matrix Q, called the inverse square root of S, such that QSQ = I T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014
d d

4.6. LINEAR STATE SPACE MODELS

205

Here I is the k k identity matrix Putting these things together, your rst exercise is to show that if Q is the inverse square root of , then d n ) Zn := nQ(X Z N ( 0, I ) Applying the continuous mapping theorem one more time tells us that Zn Given the distribution of Z, we conclude that n ) n Q(X
2 d 2 d

2 ( k )

(4.20)

where 2 (k ) is the chi-squared distribution with k degrees of freedom (Recall that k is the dimension of Xi , the underlying random vectors) Your second exercise is to illustrate the convergence in (4.20) with a simulation In doing so, let Xi : = where each Wi is an IID draw from the uniform distribution on [1, 1] each Ui is an IID draw from the uniform distribution on [2, 2] Ui and Wi are independent of each other Hints: 1. scipy.linalg.sqrtm(A) computes the square root of A. You still need to invert it 2. You should be able to work out from the proceding information Solution: View solution Wi Ui + Wi

4.6 Linear State Space Models


We may regard the present state of the universe as the effect of its past and the cause of its future Marquis de Laplace

Overview
This lecture introduces the linear state space dynamic system Easy to use and carries a powerful theory of prediction A workhorse with many applications representing dynamics of higher-order linear systems T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.6. LINEAR STATE SPACE MODELS

206

predicting the position of a system j steps into the future predicting a geometric sum of future values of a variable like non nancial income dividends on a stock the money supply a government decit or surplus etc., etc., . . . key ingredient of useful models Friedmans permanent income model of consumption smoothing Barros model of smoothing total tax collections Rational expectations version of Cagans model of hyperination Sargent and Wallaces unpleasant monetarist arithmetic etc., etc., . . .

The Linear State Space Model


Objects in play An n 1 vector xt denoting the state at time t = 0, 1, 2, . . . An m 1 vector of iid shocks wt+1 N (0, I ) A k 1 vector yt of observations at time t = 0, 1, 2, . . . An n n matrix A called the transition matrix An n m matrix C called the volatility matrix A k n matrix G sometimes called the output matrix Here is the linear state-space system xt+1 = Axt + Cwt+1 yt = Gxt x0 N ( 0 , 0 ) Primitives The primitives of the model are 1. the matrices A, C, G 2. shock distribution, which we have specialized to N (0, I ) 3. the distribution of the initial condition x0 , which we have set to N (0 , 0 ) (4.21)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

207

Given A, C, G and draws of x0 and w1 , w2 , . . ., the model (4.21) pins down the values of the sequences { xt } and {yt } Even without these draws, the primitives 13 pin down the distributions of { xt } and {yt } Later well see how to compute these distributions and their moments Martingale difference shocks Weve made the common assumption that the shocks are independent standardized normal vectors But much of what we say will go through under the assumption that {wt+1 } is a martingale difference sequence This means that it satises

E [ w t +1 | x t , x t 1 , . . . ] = 0

This is a weaker condition than that {wt } is iid with wt+1 N (0, I ) Examples By appropriate choice of the primitives, a variety of dynamics can be represented in terms of the linear state space model The following examples help to highlight this point They also illustrate the wise dictum nding the state is an art Second-order difference equation Let {yt } be a deterministic sequence that saties yt+1 = 0 + 1 yt + 2 yt1 s.t. y0 , y1 given (4.22)

To map (4.22) into our state space system (4.21), we set 0 0 0 0 1 C = 0 A = 0 1 2 xt = yt 0 0 1 0 y t 1 You can conrm that under these denitions, (4.21) and (4.22) agree The next gure shows dynamics of this process when 0 = 1, 1 = 0.8, 2 = 0.8, y0 = y1 = 1

G= 0 1 0

Later youll be asked to recreate this gure Univariate Autoregressive Processes We can use (4.21) to represent the model yt+1 = 1 yt + 2 yt1 + 3 yt2 + 4 yt3 + wt+1 where {wt } is iid and standard normal (4.23)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

208

To put this in the linear state space format we take xt = yt yt1 yt2 yt3 1 2 3 4 1 0 0 0 A= 0 1 0 0 0 0 1 0 0 C= 0 0

and

G= 1 0 0 0

The matrix A has the form of the companion matrix to the vector 1 2 3 4 . The next gure shows dynamics of this process when 1 = 0.5, 2 = 0.2, 3 = 0, 4 = 0.5, = 0.2, y0 = y1 = y2 , = y3 = 1

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

209

Vector Autoregressions Now suppose that yt is a k 1 vector j is a k k matrix and wt is k 1 Then (4.23) is termed a vector autoregression To map this into (4.21), we set yt y t 1 xt = y t 2 y t 3 1 2 3 4 I 0 0 0 A= 0 I 0 0 0 0 I 0 0 C= 0 0

G= I 0 0 0

where I is the k k identity matrix Seasonals We can use (4.21) to represent 1. the deterministic seasonal yt = yt4 2. the indeterministic seasonal yt = 4 yt4 + wt In fact both are special cases of (4.23) With the deterministic seasonal, the transition matrix becomes 0 0 0 1 1 0 0 0 A= 0 1 0 0 0 0 1 0 The eigenvalues are (1, 1, i, i ), and so have period four 4 The resulting sequence oscillates deterministically with period four, and can be used to model deterministic seasonals in quarterly time series The indeterministic seasonal produces recurrent, but aperiodic, seasonal uctuations. Time Trends The model yt = at + b is known as a linear time trend We can represent this model in the linear state space form by taking A= 1 1 0 1 C= 0 0 G= a b (4.24)

and starting at initial condition x0 = 0 1 In fact its possible to use the state-space system to represent polynomial trends of any order
4

For example, note that i = exp ( /2) + i sin ( /2), so the period associated with i is

2
2

= 4.

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

210

For instance, let 0 x = 0 1 It follows that 1 1 0 A = 0 1 1 0 0 1 0 C = 0 0

1 t t(t 1)/2 t A t = 0 1 0 0 1

Then xt = t(t 1)/2 t 1 , so that xt contains linear and quadratic time trends As a variation on the linear time trend model, consider yt = t + b + wt To modify (4.24) accordingly, we set A= 1 1 0 1 C= 1 0 G= 1 b (4.25)

For reasons explained below, this model is called a martingale with drift Moving Average Representations A nonrecursive expression for xt as a function of x0 , w1 , w2 , . . . , wt can be found by using (4.21) repeatedly to obtain xt = Axt1 + Cwt (4.26)

= A xt2 + ACwt1 + Cwt . . .


t 1

j =0

A j Cwt j + At x0

Representation (4.26) is a moving-average representation. It expresses { xt } as a linear function of 1. current and past values of the process {wt } and 2. the initial condition x0 As an example of a moving average representation, recall the model (4.25) You will be able to show that A j C = 1 0 Substituting into the moving-average representation (4.26), we obtain
t 1

x 1t = where x1t is the rst entry of xt

j =0

wt j +

1 t x0

The rst term on the right is a cumulated sum of martingale differences, and is therefore a martingale The second term is a translated linear function of time For this reason, the model is called a martingale with drift T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.6. LINEAR STATE SPACE MODELS

211

Distributions and Moments


Unconditional Moments Using (4.21), its easy to obtain expressions for the mean of xt and yt Letting t := E [ xt ] and using linearity of expectations, we nd that t +1 = A t The initial condition for (4.27) is the primitive 0 from (4.21) The expectation E [yt ] of yt is G t The variance-covariance matrix of xt is t := E [( xt t )( xt t ) ] Using xt+1 t+1 = A( xt t ) + Cwt+1 , we can determine this matrix recursively via t+1 = At A + CC with 0 given (4.28) (4.27)

The initial condition is 0 from the initial distribution of x0 As a matter of terminology, we will sometimes call t the unconditional mean of xt t the unconditional variance-convariance matrix of xt This is to distinguish from cases described below, where conditioning information is used However, you should be aware that these unconditional moments do depend on the initial distribution N (0 , 0 ) Distributions In general, knowing the mean and variance-covariance matrix of a random vector is not quite as good as knowing the full distribution However, there are some situations where these moments alone tell us all we need to know One such situation is when the vector in question is Gaussian (i.e., normally distributed) This is the case here, given 1. our Gaussian assumptions on the primitives 2. the fact that normality is preserved under linear operations In fact, its well-known that , S) u N (u and , BSB ) v = a + Bu = v N ( a + Bu (4.29)

In particular, given our Gaussian assumptions on the primitives and the linearity of (4.21) we can see immediately that xt and yt are all Gaussian 5 Since xt is Gaussian, to nd the distribution, all we need to do is nd its mean and variancecovariance matrix
The correct way to argue this is by induction. Suppose that xt is Gaussian. Then (4.21) and (4.29) imply that xt+1 is Gaussian. Since x0 is assumed to be Gaussian, it follows that every xt is Gaussian. Evidently this implies that each yt is Gaussian.
5

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

212

But in fact weve already done this, in (4.27) and (4.28) Letting t and t be as dened by these equations, we have xt N (t , t ) and yt N ( G t , G t G ) (4.30)

Ensemble Interpretations How should we interpret the distributions dened by (4.30)? Intuitively, the probabilities in a distribution correspond to relative frequencies in a large population drawn from that distribution Lets adapt this idea to our setting, focusing on the distribution of y T for xed T We can generate independent draws of y T by repeatedly simulating the evolution of the system up to time T , using an independent set of shocks each time The next gure shows 20 simulations, producing 20 time series for {yt }, and hence 20 draws of y T The system in question is the univariate autoregressive model (4.23) The values of y T are represented by black dots in the left-hand gure

In the right-hand gure, these values are converted into a rotated histogram that shows a rough picture of the distribution (The parameters and source code for the gures can be found in le paths_and_hist.py from the main repository) Here is another gure, this time with 100 observations Lets now try with 500,000 observations, showing only the histogram (without rotation) The black line is the density of y T calculated analytically, using (4.30) You can see the histogram and analytical distribution match closely, as expected By looking at the gures and experimenting with parameters, you will gain a feel for how the distribution depends on the model primitives listed above

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

213

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

214

Ensemble means In the preceding gure we recovered the distribution of y T by 1. generating I sample paths (i.e., time series) where I is a large number 2. recording each observation yiT 3. histograming this sample Just as the histogram corresponds to the distribution, the ensemble or cross-sectional average T := y 1 I i yT I i =1

approximates the expectation E [y T ] = G t (as implied by the law of large numbers) Heres a simulation comparing the ensemble average and true mean at time points t = 0, . . . , 50 The parameters are the same as for the preceding gures, and the sample size is relatively small ( I = 20)

The ensemble mean for xt is T := x 1 I i xT T I i =1

( I )

The right-hand size T can be thought of as a population average (By population average we mean the average across such an innite number of sample paths) Another application of the law of large numbers assures us that 1 I i T )( xiT x T ) T ( xT x I i =1

( I )

Joint Distributions In the preceding discussion we looked at the distributions of xt and yt in isolation This gives us some information, but doesnt allow us to answer questions like T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.6. LINEAR STATE SPACE MODELS

215

whats the probability that xt 0 for all t? whats the probability that yt a before yt b? etc., etc. Such questions concern the joint distributions of these sequences To compute the joint distribution of x0 , x1 , . . . , x T , recall that in general joint and conditional densities are linked by the rule p( x, y) = p(y | x ) p( x ) (joint = conditional marginal)

From this this rule we get p( x0 , x1 ) = p( x1 | x0 ) p( x0 ) Further applications of the same rule lead us to p ( x 0 , x 1 , . . . , x T ) = p ( x 0 ) p ( x t +1 | x t )
t =0 T 1

The marginal p( x0 ) is just the primitive N (0 , 0 ) In view of (4.21), the conditional densities are p( xt+1 | xt ) = N ( Axt , C C ) Here were assuming that C C is positive denite Autocovariance functions One important object related to the joint distribution is the autocovariance function t+ j,t := E [( xt+ j t+ j )( xt t ) ] (4.31) Elementary calculations show that t + j,t = A j t (4.32) Noticed that t+ j,t in general depends on both j, the gap between the two dates, and t, the earlier date

Stationarity and Ergodicity


Two properties that greatly aid analysis of linear state space models when they hold are stationarity and ergodicity Lets start with the intuition Visualizing Stability Lets look at some more time series from the same model we analyzed above This picture shows cross-sectional distributions for y at times T , T , T Note how the time series settle down in the sense that the distributions at T and T are relatively similar to each other but unlike the distribution at T In essence, the distribution of xt is converging to a xed long-run distribution as t When such a distribution exists it is called a stationary distribution T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.6. LINEAR STATE SPACE MODELS

216

Stationary Distributions Formally, a distribution is said to be stationary for xt if xt Since 1. in the present case all distributions are Gaussian 2. Gaussian distributions are pinned down by the mean and variance-covariance matrix we can restate the denition as follows: is stationary for xt if = N ( , ) where and are xed points of (4.27) and (4.28) respectively Covariance Stationary Processes Lets see what happens to the preceding gure if we start x0 at the stationary distribution The only difference between the observed distributions at T , T and T is due to the nite sample size By our choosing x0 N ( , ) the denitions of and as xed points of (4.27) and (4.28) respectively weve ensured that t = and t = for all t Moreover, in view of (4.32), the autocovariance function takes the form t+ j,t = A j , which depends on j but not on t This motivates the following denition A process { xt } is said to be covariance stationary if T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014 and xt+1 = Axt + Cwt+1

xt+1

4.6. LINEAR STATE SPACE MODELS

217

both t and t are constant in t t+ j,t depends on the time gap j but not on time t In our setting, { xt } will be covariance stationary if x0 , A, C assume values implying that all these terms do not depend on t Conditions for Stationarity The globally stable case The difference equation t+1 = At is known to have unique xed point = 0 if all eigenvalues of A have modulus less than unity That is, if (np.absolute(np.linalg.eigvals(A)) < 1).all() == True The difference equation (4.28) also has a unique xed point in this case, and, moreover t = 0 and t as t

regardless of the initial conditions 0 and 0 This is the globally stable case see these notes for more a theoretical treatment However, global stability is more than we need for stationary solutions, and often more than we want To illustrate, consider our second order difference equation example Here the state is xt = 1 yt yt1 Because of the constant rst component in the state vector, we will never have t 0 How can we nd stationary solutions that respect a constant state component?

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

218

Processes with a constant state component To investigate this problem, suppose that A and C take the form A1 a C1 A= C= 0 1 0 where A1 is an (n 1) (n 1) matrix a is an (n 1) 1 column vector Let xt = x1t 1 where x1t is (n 1) 1 It follows that x1(t+1) = A1 x1t + a + C1 wt+1 Let 1t = E [ x1t ] and take expectations on both sides of this expression to get 1(t+1) = A1 1,t + a Assume now that the moduli of the eigenvalues of A1 are all strictly less than one Then (4.33) has a unique stationary solution, which is 1 = ( I A 1 ) 1 a The stationary value of t itself is then := 1 1 The stationary values of t and t+ j,t satisfy = A A + CC t + j, j = A
j

(4.33)

(4.34)

Notice that t+ j,t depends on the time gap j but not on calendar time t In conclusion, if x0 N ( , ) and the moduli of the eigenvalues of A1 are all strictly less than unity then the { xt } process is covariance stationary, with constant state component Note: If the eigenvalues of A1 are less than unity in modulus, then (a) starting from any initial value, the mean and variance-covariance matrix both converge to their stationary values; and (b) iterations on (4.28) converge to the xed point of the discrete Lyapunov equation in the rst line of (4.34)

Ergodicity Lets suppose that were working with a covariance stationary process In this case we know that the ensemble mean will converge to as the sample size I converges to innity T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.6. LINEAR STATE SPACE MODELS

219

Averages over time Ensemble averages across simulations are interesting theoretically, but in T real life we often observe only a single time series { xt , yt }t =0 So now lets take a single time series and form the time series averages t := x 1 T

t =1

xt

and

t := y

1 T

t =1

yt

Do these time series averages converge to something interpretable in terms of our basic state-space representation? To get this desideratum, we require something called erodicity In general, ergodicity corresponds to the idea that time series and ensemble averages coincide More formally, ergodicity implies that time series sample averages converge to their expectation under the stationary distribution In particular,
1 T 1 T 1 T T t =0 x t T )( xt x ) t =0 ( x t x T )( xt x ) A j t =0 ( x t + j x

In our linear Gaussian setting, any convariance stationary process is also ergodic

Prediction
The theory of prediction for linear state space systems is elegant and relatively straightforward Forecasting Formulas Conditional Means A natural way to predict variables is to use conditional expectations For example, the optimal forecast of xt+1 given information known at time t, namely, xt is

E [xt+1 |xt ] = Axt


and the one-step-ahead forecast error is xt+1 E [ xt+1 | xt ] = Cwt+1 The covariance matrix of the forecast error is

(4.35)

(4.36)

E [(xt+1 E [xt+1 |xt ])(xt+1 E [xt+1 |xt ]) ] = CC


More generally, wed like to compute j-step ahead forecast of x:

(4.37)

E t [ x t + j ] : = E [ x t + j | x t ] = E [ x t + j | x t , x t 1 , . . . , x 0 ] j-step ahead forecast of y: E t [yt+ j ] := E [yt+ j | xt ] = E [yt+ j | xt , xt1 , . . . , x0 ]


February 5, 2014

Here are the pertinent formulas T HOMAS S ARGENT AND J OHN S TACHURSKI

4.6. LINEAR STATE SPACE MODELS

220

j-step ahead forecast of x:

E t [ xt+ j ] = A j xt
j-step ahead forecast of y:

E t [yt+j ] = GA j xt
Covariance of Prediction Errors It is useful to obtain the covariance matrix of the vector of jstep-ahead prediction errors xt+ j E t [ xt+ j ] = Evidently, Vj := E t [( xt+ j E t xt+ j )( xt+ j E t xt+ j ) ] =
j 1 k =0 j 1

s =0

As Cwts+ j Ak CC Ak

(4.38)

(4.39)

Vj dened in (4.39) can be calculated recursively via V1 = CC and Vj = CC + AVj1 A , j2 (4.40)

Vj is the conditional covariance matrix of the errors in forecasting xt+ j on the basis of time t information xt Under particular conditions, Vj converges to V = CC + AV A Equation (4.41) is an example of a discrete Lyapunov equation in the covariance matrix V A sufcient condition for Vj to converge is that the eigenvalues of A be strictly less than one in modulus. Weaker sufcient conditions for convergence associate eigenvalues equaling or exceeding one in modulus with elements of C that equal 0 Forecasts of Geometric Sums In several contexts, we want to compute forecasts of geometric sums of future random variables governed by the linear state-space system (4.21) We want the following objects
j Forecast of a geometric sum of future xs, or E t j =0 x t + j | x t j Forecast of a geometric sum of future ys, or E t j =0 y t + j | x t

(4.41)

These objects are important components of some famous and interesting dynamic models For example,
j if {yt } is a stream of dividends, then E t j=0 yt+ j | xt is a model of a stock price j if {yt } is the money supply, then E t j=0 yt+ j | xt is a model of the price level

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

221

Formulas Fortunately, it is easy to use a little matrix algebra to compute these objects
1 . Then Note: Useful fact: Suppose that the eigenvalues of A are all bounded in modulus by

I + A + 2 A + = [ I A]1 The assumption about the eigenvalues of A assure that the series on the left converges. Here are our formulas: Forecast of a geometric sum of future xs

E t j xt+ j | xt
j =0

= [ I + A + 2 A 2 + ] x t = [ I A ] 1 x t

Forecast of a geometric sum of future ys

E t j yt+ j | xt
j =0

= G [ I + A + 2 A 2 + ] x t = G [ I A ] 1 x t

Code
Our preceding simulations and calculations are based on the following code Can be found in the le lss.py from the main repository The code implements a class for handling linear state space models The methods generate simulations, calculate moments and perform other related tasks
""" Origin: QE by Thomas J. Sargent and John Stachurski Filename: lss.py LastModified: 30/01/2014 Computes quantities related to the linear state space model x_{t+1} = A x_t + C w_{t+1} y_t = G x_t The shocks {w_t} are iid and N(0, I) """ import numpy as np from numpy import dot from numpy.random import multivariate_normal from scipy.linalg import eig, solve, solve_discrete_lyapunov class LSS: def __init__(self, A, C, G, mu_0=None, Sigma_0=None): """ Provides initial parameters describing the state space model

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

222

x_{t+1} = A x_t + C w_{t+1} y_t = G x_t where {w_t} are iid and N(0, I). If the initial conditions mu_0 and Sigma_0 for x_0 ~ N(mu_0, Sigma_0) are not supplied, both are set to zero. When Sigma_0=0, the draw of x_0 is exactly mu_0. Parameters ============ All arguments should be scalars or array_like * * * * * A is n x n C is n x m G is k x n mu_0 is n x 1 Sigma_0 is n x n, positive definite and symmetric

""" self.A, self.G, self.C = map(self.convert, (A, G, C)) self.k, self.n = self.G.shape self.m = self.C.shape[1] # == Default initial conditions == # if mu_0 == None: self.mu_0 = np.zeros((self.n, 1)) else: self.mu_0 = np.asarray(mu_0) if Sigma_0 == None: self.Sigma_0 = np.zeros((self.n, self.n)) else: self.Sigma_0 = Sigma_0 def convert(self, x): """ Convert array_like objects (lists of lists, floats, etc.) into well formed 2D NumPy arrays """ return np.atleast_2d(np.asarray(x, dtype='float32')) def simulate(self, ts_length=100): """ Simulate a time series of length ts_length, first drawing x_0 ~ N(mu_0, Sigma_0) Returns ======== x : numpy.ndarray An n x ts_length array, where the t-th column is x_t y : numpy.ndarray A k x ts_length array, where the t-th column is y_t

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

223

""" x = np.empty((self.n, ts_length)) x[:,0] = multivariate_normal(self.mu_0, self.Sigma_0) w = np.random.randn(self.m, ts_length-1) for t in range(ts_length-1): x[:, t+1] = self.A.dot(x[:, t]) + self.C.dot(w[:, t]) y = self.G.dot(x) return x, y def replicate(self, T=10, num_reps=100): """ Simulate num_reps observations of x_T and y_T given x_0 ~ N(mu_0, Sigma_0). Returns ======== x : numpy.ndarray An n x num_reps array, where the j-th column is the j_th observation of x_T y : numpy.ndarray A k x num_reps array, where the j-th column is the j_th observation of y_T """ x = np.empty((self.n, num_reps)) for j in range(num_reps): x_T, _ = self.simulate(ts_length=T+1) x[:, j] = x_T[:, -1] y = self.G.dot(x) return x, y def moment_sequence(self): """ Create a generator to calculate the population mean and variance-convariance matrix for both x_t and y_t, starting at the initial condition (self.mu_0, self.Sigma_0). Returns ======== A generator, such that each iteration produces the moments of x and y, updated one unit of time. The moments are returned as a 4-tuple with the following interpretation: mu_x : numpy.ndarray An n x 1 array representing the population mean of x_t mu_y : numpy.ndarray A k x 1 array representing the population mean of y_t Sigma_x : numpy.ndarray An n x n array representing the variance-covariance matrix of x_t

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

224

Sigma_y : numpy.ndarray A k x k array representing the variance-covariance matrix of y_t """ # == Simplify names == # A, C, G = self.A, self.C, self.G # == Initial moments == # mu_x, Sigma_x = self.mu_0, self.Sigma_0 while 1: mu_y, Sigma_y = G.dot(mu_x), G.dot(Sigma_x).dot(G.T) yield mu_x, mu_y, Sigma_x, Sigma_y # == Update moments of x == # mu_x = A.dot(mu_x) Sigma_x = A.dot(Sigma_x).dot(A.T) + C.dot(C.T) def stationary_distributions(self, max_iter=200, tol=1e-5): """ Compute the moments of the stationary distributions of x_t and y_t if possible. Computation is by iteration, starting from the initial conditions self.mu_0 and self.Sigma_0 Returns ======== mu_x_star : numpy.ndarray An n x 1 array representing the stationary mean of x_t mu_y_star : numpy.ndarray An k x 1 array representing the stationary mean of y_t Sigma_x_star : numpy.ndarray An n x n array representing the stationary var-cov matrix of x_t Sigma_y_star : numpy.ndarray An k x k array representing the stationary var-cov matrix of y_t """ # == Initialize iteration == # m = self.moment_sequence() mu_x, mu_y, Sigma_x, Sigma_y = m.next() i = 0 error = tol + 1 # == Loop until convergence or failuer == # while error > tol: if i > max_iter: fail_message = 'Convergence failed after {} iterations' raise ValueError(fail_message.format(max_iter)) else: i += 1 mu_x1, mu_y1, Sigma_x1, Sigma_y1 = m.next() error_mu = np.max(np.abs(mu_x1 - mu_x)) error_Sigma = np.max(np.abs(Sigma_x1 - Sigma_x))

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.6. LINEAR STATE SPACE MODELS

225

error = max(error_mu, error_Sigma) mu_x, Sigma_x = mu_x1, Sigma_x1 # == Prepare return values == # mu_x_star, Sigma_x_star = mu_x, Sigma_x mu_y_star, Sigma_y_star = mu_y1, Sigma_y1 return mu_x_star, mu_y_star, Sigma_x_star, Sigma_y_star def geometric_sums(self, beta, x_t): """ Forecast the geometric sums S_x := E_t [sum_{j=0}^{\infty} beta^j x_{t+j} | x_t ] S_y := E_t [sum_{j=0}^{\infty} beta^j y_{t+j} | x_t ] Parameters =========== beta : float Discount factor, in [0, 1) beta : array_like The term x_t for conditioning Returns ======== S_x : numpy.ndarray Geometric sum as defined above S_y : numpy.ndarray Geometric sum as defined above """ I = np.identity(self.n) S_x = solve(I - beta * self.A, x_t) S_y = self.G.dot(S_x) return S_x, S_y

The code is relatively self explanitory and adequately documented One Python construct you might not be familiar with is the use of a generator function in the method moment_sequence() Go back and read the relevant documentation if youve forgotten how they work Examples of usage are given in the solutions to the exercises

Exercises
Exercise 1 Replicate this gure using the LSS class from lss.py Solution: View solution T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.7. A FIRST LOOK AT THE KALMAN FILTER

226

Exercise 2 Replicate this gure modulo randomness using the same class Solution: View solution Exercise 3 Replicate this gure modulo randomness using the same class The state space model and parameters are the same as for the preceding exercise Solution: View solution Exercise 4 Replicate this gure modulo randomness using the same class The state space model and parameters are the same as for the preceding exercise, except that the initial condition is the stationary distribution Hint: You can use the stationary_distributions() method to get the initial conditions The number of sample paths is 80, and the time horizon in the gure is 100 Producing the vertical bars and dots is optional, but if you wish to try, the bars are at dates 10, 50 and 75 Solution: View solution

4.7 A First Look at the Kalman Filter


Overview
This lecture provides a simple and intuitive introduction to the Kalman lter, for those who either have heard of the Kalman lter but dont know how it works, or know the Kalman lter equations, but dont know where they come from For additional (more advanced) reading on the Kalman lter, see RMT3, section 2.7. [AndersonMoore2005] The last reference gives a particularly clear and comprehensive treatment of the Kalman lter Required knowledge: Familiarity with matrix manipulations, multivariate normal distributions, covariance matrices, etc.

The Basic Idea


The Kalman lter has many applications in economics, but for now lets pretend that we are rocket scientists A missile has been launched from country Y and our mission is to track it

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.7. A FIRST LOOK AT THE KALMAN FILTER

227

Let x R2 denote the current location of the missilea pair indicating latitude-longitute coordinates on a map At the present moment in time, the precise location x is unknown, but we do have some beliefs about x One way to summarize our knowledge is a point prediction x But what if the President wants to know the probability that the missile is currently over the Sea of Japan? Better to summarize our initial beliefs with a bivariate probability density p
E

p( x )dx indicates the probability that we attach to the missile being in region E

The density p is called our prior for the random variable x To keep things tractable, we will always assume that our prior is Gaussian. In particular, we take , ) p = N (x (4.42)

is the mean of the distribution and is a 2 2 covariance matrix. In our simulations, we where x will suppose that 0.2 0.4 0.3 = x , = (4.43) 0.2 0.3 0.45 This density p( x ) is shown below as a contour map, with the center of the red ellipse being equal to x

Figure 4.1: Prior density (Click this or any other gure to enlarge.)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.7. A FIRST LOOK AT THE KALMAN FILTER

228

The Filtering Step We are now presented with some good news and some bad news The good news is that the missile has been located by our sensors, which report that the current location is y = (2.3, 1.9) The next gure shows the original prior p( x ) and the new reported location y

The bad news is that our sensors are imprecise. In particular, we should interpret the output of our sensor not as y = x, but rather as y = Gx + v, where v N (0, R) (4.44)

Here G and R are 2 2 matrices with R positive denite. Both are assumed known, and the noise term v is assumed to be independent of x , ) and this new information y to improve How then should we combine our prior p( x ) = N ( x our understanding of the location of the missile? As you may have guessed, the answer is to use Bayes theorem, which tells us we should update our prior p( x ) to p( x | y) via p(y | x ) p( x ) p( x | y) = p(y) where p(y) = p(y) p(y | x ) p( x )dx

In solving this for p( x | y), we observe that , ) p( x ) = N ( x In view of (4.44), the conditional density p(y | x ) is N ( Gx, R) p(y) does not depend on x, and enters into the calculations only as a normalizing constant

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.7. A FIRST LOOK AT THE KALMAN FILTER

229

Because we are in a linear and Gaussian framework, the updated density can be computed by calculating population linear regressions. In particular, the solution is known 6 to be F , F ) p( x | y) = N ( x where F := x + G ( G G + R ) 1 ( y G x ) x and F : = G ( G G + R ) 1 G (4.45)

Here G ( G G + R)1 is the matrix of population regression coefcients of the hidden object on the surprise y G x xx F , F ) is shown in the next gure via contour lines and the color This new density p( x | y) = N ( x map The original density is left in as contour lines for comparison

Our new density twists the prior p( x ) in a direction determined by the new information y G x In generating the gure, we set G to the identity matrix and R = 0.5 for dened in (4.43) (The code for generating this and the proceding gures can be found in the le gaussian_contours.py from the main repository) The Forecast Step What have we achieved so far? We have obtained probabilities for the current location of the state (missile) given prior and current information
See, for example, page 93 of [Bishop2006]. To get from his expressions to the ones used above, you will also need to apply the Woodbury matrix identity.
6

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.7. A FIRST LOOK AT THE KALMAN FILTER

230

This is called ltering rather than forecasting, because we are ltering out noise rather than looking into the future F , F ) is called the ltering distribution p( x | y) = N ( x But now lets suppose that we are given another task: To predict the location of the missile after one unit of time (whatever that may be) has elapsed To do this we need a model of how the state evolves Lets suppose that we have one, and that its linear and Gaussian: In particular, xt+1 = Axt + wt+1 , where wt N (0, Q) (4.46)

F , F ) to Our aim is to combine this law of motion and our current distribution p( x | y) = N ( x come up with a new predictive distribution for the location one unit of time hence F , F ) and work out the In view of (4.46), all we have to do is introduce a random vector x F N ( x distribution of Ax F + w where w is independent of x F and has distribution N (0, Q) Since linear combinations of Gaussians are Gaussian, Ax F + w is Gaussian Elementary calculations and the expressions in (4.45) tell us that F = Ax + A G ( G G + R ) 1 ( y G x ) E[ Ax F + w] = AEx F + Ew = A x and Var[ Ax F + w] = A Var[ x F ] A + Q = A F A + Q = A A A G ( G G + R)1 G A + Q The matrix A G ( G G + R)1 is often written as K and called the Kalman gain the subscript has been added to remind us that K depends on , but not y or x Using this notation, we can summarize our results as follows: Our updated prediction is the new , new ) where density N ( x new := A x + K ( y G x )new := A A K G A + Q x new , new ) is called the predictive distribution The density pnew ( x ) = N ( x The predictive distribution is the new density shown in the following gure, where the update has used parameters 1.2 0.0 A= , Q = 0.3 0.0 0.2 (4.47)

The Recursive Procedure Lets look back at what weve done. We started the current period with a prior p( x ) for the location of the missile We then used the current measurement y to update to p( x | y) Finally, we used the law of motion (4.46) for { xt } to update to pnew ( x ) If we now step into the next period, we are ready to go round again, taking pnew ( x ) as the current prior Swapping notation pt ( x ) for p( x ) and pt+1 ( x ) for pnew ( x ), the full recursive procedure is: T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.7. A FIRST LOOK AT THE KALMAN FILTER

231

t , t ) 1. Start the current period with prior pt ( x ) = N ( x 2. Observe current measurement yt


F , F ) from p ( x ) and y , applying Bayes t 3. Compute the ltering distribution pt ( x | y) = N ( x t t t rule and the conditional distribution (4.44)

t+1 , t+1 ) from the ltering distribution 4. Compute the predictive distribution pt+1 ( x ) = N ( x and (4.46) 5. Increment t by one and go to step 1 t and t are as follows Repeating (4.47), the dynamics for x t +1 = A x t + Kt ( y t G x t ) t +1 = A t A K t G t A + Q x (4.48)

These are the standard dynamic equations for the Kalman lter. See, for example, RMT3, page 58.

Convergence
t of xt The matrix t is a measure of the uncertainty of our prediction x Apart from special cases, this uncertainly will never be fully resolved, regardless of how much time elapses t is made based on information available at t 1, not t The reason is that our prediction x Even if we know the precise value of xt1 (which we dont), the transition equation (4.46) implies that xt = Axt1 + wt Since the shock wt is not observable at t 1, any time t 1 prediction of xt will incur some error (unless wt is degenerate) T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.7. A FIRST LOOK AT THE KALMAN FILTER

232

However, it is certainly possible that t converges to a constant matrix as t To study this topic, lets expand the second equation in (4.48): t +1 = A t A A t G ( G t G + R ) 1 G t A + Q This is a nonlinear difference equation in t A xed point of (4.49) is a constant matrix such that = A A A G ( G G + R ) 1 G A + Q Equation (4.50) is known as a discrete time algebraic Riccati equation Conditions under which a xed point exist and the sequence {t } converge to it are discussed in [AHMS1996] and [AndersonMoore2005], chapter 4 One sufcient (but not necessary) condition is that all the eigenvalues i of A satisfy |i | < 1 (cf. e.g., [AndersonMoore2005], p. 77) (This is a kind of contraction condition that forces the distribution of { xt } to converge) In this case, for any initial choice of 0 that is both nonnegative and symmetric, the sequence {t } in (4.49) converges to a nonnegative symmetric matrix that solves (4.50) (4.50) (4.49)

Implementation: kalman.py
This section describes a module called kalman that implements the Kalman lter The module is in le kalman.py, available from the main repository In the module, the updating rules are wrapped up in a class, which bundles together Instance data: The parameters A, G, Q, R of a given model t , t ) of the current prior the moments ( x Methods:
F , F ) t , t ) to ( x t a method prior_to_filtered() to update ( x t

a method filtered_to_forecast() to update the ltering distribution to the predictive t +1 , t +1 ) distribution which becomes the new prior ( x an update() method, which combines the last two methods a stationary_values() method, which computes the solution to (4.50) and the corresponding (stationary) Kalman gain The program is relatively simple and self-explanatory:
import numpy as np from numpy import dot from scipy.linalg import inv import riccati

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.7. A FIRST LOOK AT THE KALMAN FILTER

233

class Kalman: def __init__(self, A, G, Q, R): """ Provides initial parameters describing the state space model x_{t+1} = A x_t + w_{t+1} y_t = G x_t + v_t Parameters ============ All arguments should be scalars or array_like * * * * A Q G R is is is is n n k k x x x x n n, symmetric and nonnegative definite n k, symmetric and nonnegative definite (w_t ~ N(0, Q)) (v_t ~ N(0, R))

""" self.A, self.G, self.Q, self.R = map(self.convert, (A, G, Q, R)) self.k, self.n = self.G.shape def convert(self, x): """ Convert array_like objects (lists of lists, floats, etc.) into well formed 2D NumPy arrays """ return np.atleast_2d(np.asarray(x, dtype='float32')) def set_state(self, x_hat, Sigma): """ Set the state, which is the mean x_hat and covariance matrix Sigma of the prior/predictive density. * x_hat is n x 1 * Sigma is n x n and positive definite Must be Python scalars or NumPy arrays. """ self.current_Sigma = self.convert(Sigma) self.current_x_hat = self.convert(x_hat) self.current_x_hat.shape = self.n, 1 def prior_to_filtered(self, y): """ Updates the moments (x_hat, Sigma) of the time t prior to the time t filtering distribution, using current measurement y_t. The parameter y should be a Python scalar or NumPy array. The updates are according to x_hat^F = x_hat + Sigma G' (G Sigma G' + R)^{-1}(y - G x_hat)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.7. A FIRST LOOK AT THE KALMAN FILTER

234

Sigma^F = Sigma - Sigma G' (G Sigma G' + R)^{-1} G Sigma """ # === simplify notation === # G, R = self.G, self.R x_hat, Sigma = self.current_x_hat, self.current_Sigma # === and then update === # y = self.convert(y) y.shape = self.k, 1 A = dot(Sigma, G.T) B = dot(dot(G, Sigma), G.T) + R M = dot(A, inv(B)) self.current_x_hat = x_hat + dot(M, (y - dot(G, x_hat))) self.current_Sigma = Sigma - dot(M, dot(G, Sigma)) def filtered_to_forecast(self): """ Updates the moments of the time t filtering distribution to the moments of the predictive distribution -- which becomes the time t+1 prior """ # === simplify notation === # A, Q = self.A, self.Q x_hat, Sigma = self.current_x_hat, self.current_Sigma # === and then update === # self.current_x_hat = dot(A, x_hat) self.current_Sigma = dot(A, dot(Sigma, A.T)) + Q def update(self, y): """ Updates x_hat and Sigma given k x 1 ndarray y. The full update, from one period to the next """ self.prior_to_filtered(y) self.filtered_to_forecast() def stationary_values(self): """ Computes the limit of Sigma_t as t goes to infinity by solving the associated Riccati equation. Computation is via the doubling algorithm (see the documentation in riccati.dare). Returns the limit and the stationary Kalman gain. """ # === simplify notation === # A, Q, G, R = self.A, self.Q, self.G, self.R # === solve Riccati equation, obtain Kalman gain === # Sigma_infinity = riccati.dare(A.T, G.T, R, Q) temp1 = dot(dot(A, Sigma_infinity), G.T) temp2 = inv(dot(G, dot(Sigma_infinity, G.T)) + R) K_infinity = dot(temp1, temp2) return Sigma_infinity, K_infinity

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.7. A FIRST LOOK AT THE KALMAN FILTER

235

Exercises
Exercise 1 Consider the following simple application of the Kalman lter, loosely based on RMT3, section 2.9.2 Suppose that all variables are scalars the hidden state { xt } is in fact constant, equal to some R unknown to the modeler State dynamics are therefore given by (4.46) with A = 1, Q = 0 and x0 = The measurement equation is yt = + vt where vt is N (0, 1) and iid The task of this exercise to simulate the model and, using the module kalman, plot the rst ve t , t ) predictive densities pt ( x ) = N ( x As shown in RMT3, sections 2.9.12.9.2, these distributions asymptotically put all mass on the unknown value 0 = 8 and 0 = 1 In the simulation, take = 10, x Your gure should modulo randomness look something like this

Solution: View solution Exercise 2 The preceding gure gives some support to the idea that probability mass converges to

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.7. A FIRST LOOK AT THE KALMAN FILTER

236

To get a better idea, choose a small

> 0 and calculate


zt := 1
+

pt ( x )dx

for t = 0, 1, 2, . . . , T Plot zt against T , setting

= 0.1 and T = 600

Your gure should show error declining something like this, although the output is quite variable

Solution: View solution Exercise 3 As discussed above, if the shock sequence {wt } is not degenerate, then it is not in general possible to predict xt without error at time t 1 (and this would be the case even if we could observe xt1 ) t made by the Kalman lter against a competitor who is alLets now compare the prediction x lowed to observe xt1 This competitor will use the conditional expectation E[ xt | xt1 ], which in this case is Axt1 The conditional expectation is known to be the optimal prediction method in terms of minimizing mean squared error (More precisely, the minimizer of E xt g( xt1 )
2

with respect to g is g ( xt1 ) := E[ xt | xt1 ])

Thus we are comparing the Kalman lter against a competitor who has more information (in the sense of being able to observe the latent state) and behaves optimally in terms of minimizing squared error

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.7. A FIRST LOOK AT THE KALMAN FILTER

237

Our horse race will be assessed in terms of squared error In particular, your task is to generate a graph plotting observations of both xt Axt1 t 2 against t for t = 1, . . . , 50 xt x For the parameters, set G = I , R = 0.5 I and Q = 0.3 I , where I is the 2 2 identity Set A= To initialize the prior density, set 0 = 0 = (8, 8) and x Finally, set x0 = (0, 0) You should end up with a gure similar to the following (modulo randomness) 0.9 0.3 0.3 0.9 0.5 0.4 0.6 0.3
2

and

Observe how, after an initial learning period, the Kalman lter performs quite well, even relative to the competitor who predicts optimally with knowledge of the latent state Solution: View solution Exercise 4 Try varying the coefcient 0.3 in Q = 0.3 I up and down Observe how the diagonal values in the stationary solution (see (4.50)) increase and decrease in line with this coefcient The interpretation is that more randomness in the law of motion for xt causes more (permanent) uncertainty in prediction Solution: View solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

238

4.8 Innite Horizon Dynamic Programming


Overview
In a previous lecture we gained some intuition about nite stage dynamic programming by studying the shortest path problem The aim of this lecture is to introduce readers to methods for solving simple innite-horizon dynamic programming problems using Python We will also introduce and motivate some of the modeling choices used throughout the lectures to treat this class of problems The particular application we will focus on is solving for consumption in an optimal growth model Although the model is quite specic, the key ideas extend to many other problems in dynamic optimization The model is also very simplistic we favor ease of exposition over realistic assumptions throughout the current lecture Other References For supplementary reading see RMT3, section 3.1 EDTC, section 6.2 and chapter 10 [Sundaram1996], chapter 12 [StokeyLucas1989], chapters 25 [HernandezLermaLasserre1996], all

An Optimal Growth Model


Consider an agent who owns at time t capital stock k t R+ := [0, ) and produces output f ( k t ) R+ This output can either be consumed or saved as capital for next period For simplicity we assume that depreciation is total, so that next period capital is just output minus consumption: k t +1 = f ( k t ) c t (4.51) Taking k0 as given, we suppose that the agent wishes to maximize

t =0

t u(ct )

(4.52)

where u is a given utility function and (0, 1) is a discount factor More precisely, the agent wishes to select a path c0 , c1 , c2 , . . . for consumption that is

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

239

1. nonnegative 2. feasible in the sense that the capital path {k t } determined by {ct }, k0 and (4.51) is always nonnegative 3. optimal in the sense that it maximizes (4.52) relative to all other feasible consumption sequences A well-known result from the standard theory of dynamic programming (cf., e.g., [StokeyLucas1989], section 4.1) states that, for kind of this problem, any optimal consumption sequence {ct } must be Markov That is, there exists a function such that ct = (k t ) for all t

In other words, the current control is a xed (i.e., time homogeneous) function of the current state The Policy Function Approach As it turns out, we are better off seeking the function directly, rather than the optimal consumption sequence The main reason is that the functional approach seeking the optimal policy translates directly over to the stochastic case, whereas the sequential approach does not For this model, we will say that function mapping R+ into R+ is a feasible consumption policy if it satises (k ) f (k ) for all k R+ (4.53) The set of all such policies will be denoted by Using this notation, the agents decision problem can be rewritten as max

t =0

t u((kt ))

(4.54)

where the sequence {k t } in (4.54) is given by k t +1 = f ( k t ) ( k t ), k0 given (4.55)

In the next section we discuss how to solve this problem for the maximizing

Dynamic Programming
We will solve for the optimal policy using dynamic programming The rst step is to dene the policy value function v associated with a given policy , which is v (k0 ) := when {k t } is given by (4.55)

t =0

t u((kt ))

(4.56)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

240

Evidently v (k0 ) is the total present value of discounted utility associated with following policy forever, given initial capital k0 The value function for this optimization problem is then dened as v (k0 ) := sup v (k0 )

(4.57)

The value function gives the maximal value that can be obtained from state k0 , after considering all feasible policies A policy is called optimal if it attains the supremum in (4.57) for all k0 R+ The Bellman equation for this problem takes the form v (k) = max {u(c) + v ( f (k ) c)}
0 c k

for all

k R+

(4.58)

It states that maximal value from a given state can be obtained by trading off current reward from a given action against the (discounted) future value of the state resulting from that action (If the intuition behind the Bellman equation is not clear to you, try working through this lecture) As a matter of notation, given a continuous function w on R+ , we say that policy is w-greedy if (k ) is a solution to (4.59) max {u(c) + w( f (k ) c)}
0 c k

for every k R+ Theoretical Results As with most optimization problems, conditions for existence of a solution typically require some form of continuity and compactness In addition, some restrictions are needed to ensure that the sum of discounted utility is always nite For example, if we are prepared to assume that f and u are continuous and u is bounded, then 1. The value function v is nite, bounded, continuous and satises the Bellman equation 2. At least one optimal policy exists 3. A policy is optimal if and only if it is v -greedy (For a proof see, for example, proposition 10.1.13 of EDTC) In view of these results, to nd an optimal policy, one option perhaps the most common is to 1. compute v 2. solve for a v -greedy policy The advantage is that, once we get to the second step, we are solving a one-dimensional optimization problem the problem on the right-hand side of (4.58) This is much easier than an innite-dimensional optimization problem, which is what we started out with T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

241

(An innite sequence {ct } is a point in an innite-dimensional space) In fact step 2 is almost trivial once v is obtained For this reason, most of our focus is on the rst step how to obtain the value function Value Function Iteration The value function v can be obtained by an iterative technique: Starting with a guess some initial function w and successively improving it The improvement step involves applying an operator (a mathematical term for a function that takes a function as an input and returns a new function as an output) The operator in question is the Bellman operator The Bellman operator for this problem is a map T sending function w into function Tw via Tw(k ) := max {u(c) + w( f (k ) c)}
0 c k

(4.60)

Now let w be any countinuous bounded function It is known that iteratively applying T from initial condition w produces a sequence of functions w, Tw, T ( Tw) = T 2 w, . . . that converges uniformly to v (For a proof see, for example, lemma 10.1.20 of EDTC) This convergence will be prominent in our numerical experiments Unbounded Utility The theoretical results stated above assume that the utility function is bounded In practice economists often work with unbounded utility functions For utility functions that are bounded below (but possibly unbounded above), a clean and comprehensive theory now exists (Section 12.2 of EDTC provides one exposition) For utility functions that are unbounded both below and above the situation is more complicated For recent work on deterministic problems, see, for example, [Kamihigashi2012] or [MV2010] In this lecture we will use both bounded and unbounded utility functions without dwelling on the theory

Computation
Lets now look at computing the value function and the optimal policy Fitted Value Iteration The rst step is to compute the value function by iterating with the Bellman operator In theory, the algorithm is as follows

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

242

1. Begin with a function w an initial condition 2. Solving (4.60), obtain the function Tw 3. Unless some stopping condition is satised, set w = Tw and go to step 2 However, there is a problem we must confront before we implement this procedure: The iterates can neither be calculated exactly nor stored on a computer To see the issue, consider (4.60) Even if w is a known function, unless Tw can be shown to have some special structure, the only way to store this function is to record the value Tw(k ) for every k R+ Clearly this is impossible What we will do instead is use tted value function iteration The procedure is to record the value of the function Tw at only nitely many grid points {k1 , . . . , k I } R+ , and reconstruct it from this information when required More precisely, the algorithm will be 1. Begin with an array of values {w1 , . . . , w I }, typically representing the values of some initial function w on the grid points {k1 , . . . , k I } on the state space R+ by interpolating the points {w1 , . . . , w I } 2. build a function w (k i ) on each grid point k i 3. By repeatedly solving (4.60), obtain and record the value T w ( k 1 ), . . . , T w (k I )} and 4. Unless some stopping condition is satised, set {w1 , . . . , w I } = { T w go to step 2 How should we go about step 2? This is a problem of function approximation, and their are many ways to approach it Whats important here is that the function approximation scheme must not only produce a good approximation to Tw, but also combine well with the broader iteration algorithm described above One good choice from both respects is continuous piecewise linear interpolation (see this paper for further discussion) The next gure illustrates piecewise linear interpolation of an arbitrary function on grid points 0, 0.2, 0.4, . . . , 1 (source code) Another advantage of piecewise linear interpolation is that it preserves useful shape properties such as monotonicity and concavity / convexity A First Pass Implementation Lets now look at an implementation of tted value function iteration using Python In the example below, f (k ) = k with = 0.65 u(c) = ln c and = 0.95

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

243

As is well-known (see RMT3, section 3.1.2), for this particular problem an exact analytical solution is available, with v (k ) = c1 + c2 ln k (4.61) for c1 : = ln(1 ) ln() + 1 (1 )(1 ) and c2 : = 1

At this stage, our only aim is to see if we can replicate this solution numerically, using tted value function iteration Heres a rst-pass solution, the details of which are explained below The code can be found in le optgrowth_v0.py from the main repository
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: optgrowth_v0.py Authors: John Stachurski and Thomas Sargent LastModified: 11/08/2013 A first pass at solving the optimal growth problem via value function iteration, provided as an introduction to the techniques. A more general version is provided in optgrowth.py. """ from __future__ import division # Omit for Python 3.x import matplotlib.pyplot as plt import numpy as np from numpy import log from scipy.optimize import fminbound from scipy import interp ## Primitives and grid alpha = 0.65 beta=0.95 grid_max=2

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

244

grid_size=150 grid = np.linspace(1e-6, grid_max, grid_size) ## Exact solution ab = alpha * beta c1 = (log(1 - ab) + log(ab) * ab / (1 - ab)) / (1 - beta) c2 = alpha / (1 - ab) def v_star(k): return c1 + c2 * log(k) def bellman_operator(w): """ The approximate Bellman operator, which computes and returns the updated value function Tw on the grid points. * w is a flat NumPy array with len(w) = len(grid) The vector w represents the value of the input function on the grid points. """ # === Apply linear interpolation to w === # Aw = lambda x: interp(x, grid, w) # === set Tw[i] equal to max_c { log(c) + beta w(f(k_i) - c)} === # Tw = np.empty(grid_size) for i, k in enumerate(grid): objective = lambda c: - log(c) - beta * Aw(k**alpha - c) c_star = fminbound(objective, 1e-6, k**alpha) Tw[i] = - objective(c_star) return Tw # === If file is run directly, not imported, produce figure === # if __name__ == '__main__': w = 5 * log(grid) - 25 # An initial condition -- fairly arbitrary n = 35 fig, ax = plt.subplots() ax.set_ylim(-40, -20) ax.set_xlim(np.min(grid), np.max(grid)) ax.plot(grid, w, color=plt.cm.jet(0), lw=2, alpha=0.6, label='initial condition') for i in range(n): w = bellman_operator(w) ax.plot(grid, w, color=plt.cm.jet(i / n), lw=2, alpha=0.6) ax.plot(grid, v_star(grid), 'k-', lw=2, alpha=0.8, label='true value function') ax.legend(loc='upper left') plt.show()

Running the code produces the following gure The curves in this picture represent 1. the rst 35 functions generated by the tted value function iteration algorithm described above, with hotter colors given to higher iterates T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

245

2. the true value function as specied in (4.61), drawn in black The sequence of iterates coverges towards v If we increase n and run again we see further improvement the next gure shows n = 75 Incidentally, it is true that knowledge of the functional form of v for this model has inuenced our choice of the initial condition
w = 5 * log(grid) - 25

In more realistic problems such information is not available, and convergence will probably take longer Comments on the Code The function bellman_operator implements steps 23 of the tted value function algorithm discussed above Linear interpolation is performed by SciPys interp function Like the rest of SciPys numerical solvers, fminbound minimizes its objective, so we use the identity maxx f ( x ) = minx f ( x ) to solve (4.60) The line if __name__ == '__main__': is very common, and operates as follows If the le is run as the result of an import statement in another le, the clause evaluates to False, and the code block is not executed If the le is run directly as a script, the clause evaluates to True, and the code block is executed

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

246

To see how this trick works, suppose we have a le in our current working directory called test_file.py that contains the single line
print(__name__)

Now consider the following, executed in IPython


In [1]: run test_file.py __main__ In [2]: import test_file test_file

Hopefully you can now see how it works The benet is that we can now import the functionality in optgrowth_v0.py without necessarily generating the gure The Policy Function To compute an approximate optimal policy, we run the tted value function algorithm until approximate convergence Taking the function so produced as an approximation to v , we then compute the (approximate) v -greedy policy For this particular problem, the optimal consumption policy has the known analytical solution (k ) = (1 )k The next gure compares the numerical solution to this exact solution In the three gures, the approximation to v is obtained by running the loop in the tted value function algorithm 2, 4 and 6 times respectively T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

247

Even with as few as 6 iterates, the numerical result is quite close to the true policy

Exercise 1 asks you to reproduce this gure although you should read the next section rst

Writing Reusable Code


The title of this section might sound uninteresting and a departure from our topic, but its equally important if not more so Its understandable that many economists never consider the basic principles of software development, preoccupied as they are with the applied aspects of trying to implement their projects However, in programming as in many things, success tends to nd those who focus on what is important, not just what is urgent The Danger of Copy and Paste For computing the value function of the particular growth model studied above, the code we have already written (in le optgrowth_v0.py, shown here) is perfectly ne

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

248

However, suppose that we now want to solve a different growth model, with different technology and preferences Probably we want to keep our existing code, so lets follow our rst instinct and 1. copy the contents of optgrowth_v0.py to a new le 2. then make the necessary changes Now lets suppose that we repeat this process again and again over several years, so we now have many similar les (And perhaps were doing similar things with other projects, leading to hundreds of specialized and partially related Python les lying around our le system) There are several potential problems here Problem 1 First, if we now realize weve been making some small but fundamental error with our dynamic programming all this time, we have to modify all these different les And now we realize that we dont quite remember which les they were, and where exactly we put them... So we x all the ones we can nd spending a few hours in the process, since each implementation is slightly different and takes time to remember and leave the ones we cant Now, 6 weeks later, we need to use one of these les But is le X one that we have xed, or is it not? In this way, our code base becomes a mess, with related functionality scattered across many les, and errors creeping into them Problem 2 A second issue here is that since all these les are specialized and might not be used again, theres little incentive to invest in writing them cleanly and efciently DRY The preceding discussion leads us to one of the most fundamental principles of code development: dont repeat yourself To the extent that its practical, always strive to write code that is abstract and generic in order to facilitate reuse try to ensure that each distinct logical concept is repeated in your code base as few times as possible To this end, we are now going to rewrite our solution to the optimal growth problem given in optgrowth_v0.py (shown above) with the intention of producing a more generic version While some aspects of this exercise might seem like overkill, the principles are important, and easy to illustrate in the context of the current problem

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

249 we want our function

Implementation 2 In writing our second implementation, bellman_operator to be able to handle a wider class of models

In particular, we dont want model specics hardwired into this function Instead, we would like to pass bellman_operator a description of a model (technology, preferences, etc.), and have it act as the Bellman operator associated with that model It will be convenient to bundle the description of a model into a single object that we can pass to any function that needs it The attributes of this model object will be the model features, such as technology, preferences and discount factor This idea is implemented in the code below, with model objects being instances of a class called growthModel
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: optgrowth.py Authors: John Stachurski and Thomas Sargent LastModified: 11/08/2013 Solving the optimal growth problem via value function iteration. """ from __future__ import division # Omit for Python 3.x import numpy as np from scipy.optimize import fminbound from scipy import interp class growthModel: """ This class is just a "struct" to hold the collection of primitives defining the growth model. The default values are f(k) = k**alpha, i.e, Cobb-douglas production function u(c) = ln(c), i.e, log utility See the __init__ function for details """ def __init__(self, f=lambda k: k**0.65, beta=0.95, u=np.log, grid_max=2, grid_size=150): """ Parameters: * f is the production function and u is the utility function * beta is the discount factor, a scalar in (0, 1) * grid_max and grid_size describe the grid """ self.u, self.f, self.beta = u, f, beta self.grid = np.linspace(1e-6, grid_max, grid_size)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

250

def bellman_operator(gm, w): """ The approximate Bellman operator, which computes and returns the updated value function Tw on the grid poitns. Parameters: * gm is an instance of the growthModel class * w is a flat NumPy array with len(w) = len(grid) The vector w represents the value of the input function on the grid points. """ # === Apply linear interpolation to w === # Aw = lambda x: interp(x, gm.grid, w) # === set Tw[i] equal to max_c { u(c) + beta w(f(k_i) - c)} === # Tw = np.empty(len(w)) for i, k in enumerate(gm.grid): objective = lambda c: - gm.u(c) - gm.beta * Aw(gm.f(k) - c) c_star = fminbound(objective, 1e-6, gm.f(k)) Tw[i] = - objective(c_star) return Tw def compute_greedy(gm, w): """ Compute the w-greedy policy on the grid points.

Parameters:

* gm is an instance of the growthModel class * w is a flat NumPy array with len(w) = len(grid) """ # === Apply linear interpolation to w === # Aw = lambda x: interp(x, gm.grid, w) # === set sigma[i] equal to argmax_c { u(c) + beta w(f(k_i) - c)} === # sigma = np.empty(len(w)) for i, k in enumerate(gm.grid): objective = lambda c: - gm.u(c) - gm.beta * Aw(gm.f(k) - c) sigma[i] = fminbound(objective, 1e-6, gm.f(k)) return sigma

As discussed above, instances of growthModel are simple objects that store information above the primitives (as their attributes) (Review this lecture if you have forgotten the syntax for class denitions) The two functions bellman_operator and compute_greedy both take these instances as their rst argument, providing them with information about the model T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

251

Of course we could omit the growthModel class and just pass this information to bellman_operator and compute_greedy as a list of separate arguments For example
Tw = bellman_operator(f, beta, u, grid_max, grid_size, w)

This approach is also ne, but a little more error prone (which argument comes third again?), particularly when you need to pass the same information to various other functions It also increases the number of global variables in the module These kinds of problems before more accute as we move on to more complex programming problems Hence the decision to bundle the model description into a single object Iteration The next thing we need to do is implement iteration of the Bellman operator Since iteratively applying an operator is something well do a lot of, lets write this as generic, reusable code Our code is written in the le compute_fp.py from the main repository, and displayed below As currently written, the code continues iteration until one of two stopping coditions holds 1. Successive iterates become sufciently close together, in the sense that the maximum deviation between them falls below error_tol 2. The number of iterations exceeds max_iter
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: compute_fp.py Authors: John Stachurski and Thomas Sargent LastModified: 11/08/2013 Compute the fixed point of a given operator T, starting from specified initial condition v. """ import numpy as np def compute_fixed_point(T, specs, v, error_tol=1e-3, max_iter=50, verbose=1): """ Computes and returns T^k v, where T is an operator, v is an initial condition and k is the number of iterates. Provided that T is a contraction mapping or similar, T^k v will be an approximation to the fixed point. The convention for using this function is that T can be called as new_v = T(specs, v). """

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.8. INFINITE HORIZON DYNAMIC PROGRAMMING

252

iterate = 0 error = error_tol + 1 while iterate < max_iter and error > error_tol: new_v = T(specs, v) iterate += 1 error = np.max(np.abs(new_v - v)) if verbose: print "Computed iterate %d with error %f " % (iterate, error) v = new_v return v

Heres a simple IPython session that uses the code provided above to compute the optimal consumption policy
In [1]: from optgrowth import * In [2]: from compute_fp import * In [3]: gm = growthModel() # Uses default parameters In [4]: w = 5 * gm.u(gm.grid) - 25 # Initial condition for iteration # Compute value function

In [5]: v_star = compute_fixed_point(bellman_operator, gm, w) In [6]: sigma = compute_greedy(gm, v_star)

# Compute optimal policy

Exercises
Exercise 1 Replicate the optimal policy gure shown above Use the same parameters and initial condition found in optgrowth.py Solution: View solution Exercise 2 Once an optimal consumption policy is given, the dynamics for the capital stock follow (4.55) The next gure shows the rst 25 elements of this sequence for three different discount factors (and hence three different policies) In each sequence, the initial condition is k0 = 0.1 The discount factors are discount_factors = (0.9, 0.94, 0.98) Otherwise, the parameters and primitives are the same as found in optgrowth.py Replicate the gure Solution: View solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

253

4.9 LQ Control Problems


Overview
Linear quadratic (LQ) control refers to a class of dynamic optimization problems that have found applications in almost every scientic eld This lecture provides an introduction to LQ control and its economic applications As we will see, LQ systems have a simple structure that makes them an excellent workhorse for a wide variety of economic problems Moreover, while the linear-quadratic structure is restrictive, it is in fact far more exible than it may appear initially These themes are addressed repeatedly below Mathematically, LQ control problems are closely related to the Kalman lter, although we wont pursue the deeper connections in this lecture In reading what follows, it will be useful to have some familiarity with matrix manipulations vectors of random variables dynamic programming and the Bellman equation (see for example this lecture and this lecture) For additional reading on LQ control, see, for example, RMT3, chapter 5 [HansenSargent2008], chapter 4 [HernandezLermaLasserre1996], section 3.5

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

254

In order to focus on computation, we leave longer proofs to these sources (while trying to provide as much intuition as possible)

Introduction
The linear part of LQ is a linear law of motion for the state, while the quadratic part refers to preferences Lets begin with the former, move on to the latter, and then put them together into an optimization problem The Law of Motion Let xt be a vector describing the state of some economic system Suppose that xt follows a linear law of motion given by xt+1 = Axt + But + Cwt+1 , Here ut is a control vector, incorporating choices available to a decision maker confronting the current state xt {wt } is an uncorrelated zero mean shock process satisfying Ewt wt = I , where the right-hand side is the identity matrix Regarding the dimensions xt is n 1, A is n n ut is k 1, B is n k wt is j 1, C is n j Example 1 Consider a household budget constraint given by bt + 1 + c t = ( 1 + r ) bt + y t Here bt is assets, r is a xed interest rate, ct is current consumption, and yt is current non-nancial income If we suppose that {yt } is uncorrelated and N (0, 2 ), then, taking {wt } to be standard normal, we can write the system as bt + 1 = ( 1 + r ) bt c t + w t + 1 This is clearly a special case of (4.62), with assets being the state and consumption being the control Example 2 One unrealistic feature of the previous model is that non-nancial income has a zero mean and is often negative This can easily be overcome by adding a sufciently large mean Hence in this example we take yt = wt+1 + for some positive real number T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014 t = 0, 1, 2, . . . (4.62)

4.9. LQ CONTROL PROBLEMS

255

Another alteration thats useful to introduce (well see why soon) is to chance the control variable from consumption to the deviation of consumption from some ideal quantity c is large relative to the amount of consumption that is (Most parameterizations will be such that c attainable in each period, and hence the household wants to increase consumption) For this reason, we now take our control to be ut := ct c In terms of these variables, the budget constraint bt+1 = (1 + r )bt ct + yt becomes + w t +1 + bt + 1 = ( 1 + r ) bt u t c How can we write this new system in the form of equation (4.62)? If, as in the previous example, we take bt as the state, then we run into a problem: the law of motion contains some constant terms on the right-hand side This means that we are dealing with an afne function, not a linear one (recall this discussion) Fortunately, we can easily circumvent this problem by adding an extra state variable In particular, if we write bt + 1 1 (4.63)

+ 1 + r c 0 1

bt 1

1 0

ut +

w t +1

(4.64)

then the rst row is equivalent to (4.63) Moreover, the model is now linear, and can be written in the form of (4.62) by setting xt := bt 1 , A := + 1 + r c 0 1 , B :=

1 0

C :=

(4.65)

In effect, weve bought ourselves linearity by adding another state Preferences In the LQ model, the aim is to minimize a ow of losses, where time-t loss is given by the quadratic expression xt Rxt + ut Qut (4.66) Here R is assumed to be n n, symmetric and nonnegative denite Q is assumed to be k k, symmetric and positive denite Note: In fact, for many economic problems, the deniteness conditions on R and Q can be relaxed. It is sufcient that certain submatrices of R and Q be nonnegative denite. See [HansenSargent2008] for details

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

256

Example 1 A very simple example that satises these assumptions is to take R and Q to be identity matrices, so that current loss is xt Ixt + ut Iut = xt
2

+ ut

Thus, for both the state and the control, loss is measured as squared distance from the origin (In fact the general case (4.66) can also be understood in this way, but with R and Q identifying other non-Euclidean notions of distance from the zero vector) Intuitively, we can often think of the state xt as representing deviation from a target, such as deviation of ination from some target level deviation of a rms capital stock from some desired quantity The aim is to put the state close to the target, while using controls parsimoniously Example 2 In the household problem studied above, setting R = 0 and Q = 1 yields preferences )2 xt Rxt + ut Qut = u2 t = (ct c Under this specication, the households current loss is the squared deviation of consumption from the ideal level c

Optimality  Finite Horizon


Lets now be precise about the optimization problem we wish to consider, and look at how to solve it The Objective We will begin with the nite horizon case, with terminal time T N In this case, the aim is to choose a sequence of controls {u0 , . . . , u T 1 } to minimize the objective E
T 1 t =0

t (xt Rxt + ut Qut ) + T xT R f xT

(4.67)

subject to the law of motion (4.62) and initial state x0 The new objects introduced here are and the matrix R f The scalar is the discount factor, while x R f x gives terminal loss associated with state x Comments: We assume R f to be n n, symmetric and nonnegative denite We allow = 1, and hence include the undiscounted case x0 may itself be random, in which case we require it to be independent of the shock sequence w1 , . . . , w T

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

257

Information Theres one constraint weve neglected to mention so far, which is that the decision maker who solves this LQ problem knows only the present and the past, not the future To clarify this point, consider the sequence of controls {u0 , . . . , u T 1 } When choosing these controls, the decision maker is permitted to take into account the effects of the shocks {w1 , . . . , wT } on the system However, it is typically assumed and will be assumed here that the time-t control ut can only be made with knowledge of past and present shocks The fancy measure-theoretic way of saying this is that ut must be measurable with respect to the -algebra generated by x0 , w1 , w2 , . . . , wt This is in fact equivalent to stating that ut can be written in the form ut = gt ( x0 , w1 , w2 , . . . , wt ) for some Borel measurable function gt (Just about every function thats useful for applications is Borel measurable, so, for the purposes of intuition, you can read that last phrase as for some function gt ) Now note that xt will ultimately depend on the realizations of x0 , w1 , w2 , . . . , wt In fact it turns out that xt summarizes all the information about these historical shocks that the decision maker needs to set controls optimally More precisely, it can be shown that any optimal control ut can always be written as a function of the current state alone Hence in what follows we restrict attention to control policies (i.e., functions) of the form ut = gt ( x t ) Actually, the preceding discussion applies to all standard dynamic programming problems Whats special about the LQ case is that as we shall soon see the optimal ut turns out to be a linear function of xt Solution To solve the nite horizon LQ problem we can use a dynamic programming strategy based on backwards induction that is conceptually similar to the approach adopted in this lecture For reasons that will soon become clear, we rst introduce the notation JT ( x ) := x R f x Now consider the problem of the decision maker in the second to last period In particular, let the time be T 1, and suppose that the state is x T 1 The decision maker must trade off current and (discounted) nal losses, and hence solves min{ x T 1 Rx T 1 + u Qu + E JT ( Ax T 1 + Bu + CwT )}
u

At this stage, it is convenient to dene the function JT 1 ( x ) := min{ x Rx + u Qu + E JT ( Ax + Bu + CwT )}


u

(4.68)

The function JT 1 will be called the T 1 value function, and JT 1 ( x ) can be thought of as representing total loss-to-go from state x at time T 1 when the decision maker behaves optimally T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.9. LQ CONTROL PROBLEMS

258

Now lets step back to T 2 For a decision maker at T 2, the value JT 1 ( x ) plays a role analogous to that played by the terminal loss JT ( x ) = x R f x for the decision maker at T 1 That is, JT 1 ( x ) summarizes the future loss associated with moving to state x The decision maker chooses her control u to trade off current loss against future loss, where the next period state is x T 1 = Ax T 2 + Bu + CwT 1 , and hence depends on the choice of current control the cost of landing in state x T 1 is JT 1 ( x T 1 ) Her problem is therefore min{ x T 2 Rx T 2 + u Qu + E JT 1 ( Ax T 2 + Bu + CwT 1 )}
u

Letting JT 2 ( x ) := min{ x Rx + u Qu + E JT 1 ( Ax + Bu + CwT 1 )}


u

the pattern for backwards induction is now clear In particular, we dene a sequence of value functions { J0 , . . . , JT } via Jt1 ( x ) = min{ x Rx + u Qu + E Jt ( Ax + Bu + Cwt+1 )}
u

and

JT ( x ) = x R f x

The rst equality is the Bellman equation from dynamic programming theory specialized to the nite horizon LQ problem Now that we have { J0 , . . . , JT }, we can obtain the optimal controls As a rst step, lets nd out what the value functions look like It turns out that every Jt has the form Jt ( x ) = x Pt x + dt where Pt is a n n matrix and dt is a constant We can show this by induction, starting from PT := R f and d T = 0 Using this notation, (4.68) becomes JT 1 ( x ) := min{ x Rx + u Qu + E( Ax + Bu + CwT ) PT ( Ax + Bu + CwT )}
u

(4.69)

To obtain the minimizer, we can take the derivative of the r.h.s. with respect to u and set it equal to zero Applying the relevant rules of matrix calculus, this gives u = ( Q + B PT B)1 B PT Ax Plugging this back into (4.69) and rearranging yields JT 1 ( x ) := x PT 1 x + d T 1 where PT 1 := R 2 A PT B( Q + B PT B)1 B PT A + A PT A (4.71) February 5, 2014 (4.70)

T HOMAS S ARGENT AND J OHN S TACHURSKI

4.9. LQ CONTROL PROBLEMS

259

and d T 1 := trace(C PT C ) (The algebra is a good exercise well leave it up to you) If we continue working backwards in this manner, it soon becomes clear that Jt ( x ) := x Pt x + dt as claimed, where { Pt } and {dt } satisfy the recursions Pt1 := R 2 A Pt B( Q + B Pt B)1 B Pt A + A Pt A and dt1 := (dt + trace(C PT C )) with dT = 0 (4.74) Recalling (4.70), the minimizers from these backward steps are ut = Ft xt where Ft := ( Q + B Pt+1 B)1 B Pt+1 A (4.75) with PT = R f (4.73) (4.72)

These are the linear optimal control policies we discussed above In particular, the sequence of controls given by (4.75) and (4.62) solves our nite horizon LQ problem Rephrasing this more precisely, the sequence u0 , . . . , u T 1 given by ut = Ft xt with xt+1 = ( A BFt ) xt + Cwt+1 (4.76)

for t = 0, . . . , T 1 attains the minimum of (4.67) subject to our constraints An Application Early Keynesian models assumed that households have a constant marginal propensity to consume from current income Data contradicted the constancy of the marginal propensity to consume In response, Milton Friedman, Franco Modigliani and many others built models based on a consumers preference for a stable consumption stream (See, for example, [Friedman1956] or [ModiglianiBrumberg1954]) One property of those models is that households purchase and sell nancial assets to make consumption streams smoother than income streams The household savings problem outlined above captures these ideas The optimization problem for the household is to choose a consumption sequence in order to minimize E
T 1 t =0

)2 + T qb2 t (ct c T

(4.77)

subject to the sequence of budget constraints bt+1 = (1 + r )bt ct + yt , t 0 Here q is a large positive constant, the role of which is to induce the consumer to target zero debt at the end of her life in each period, letting assets (Without such a constraint, the optimal choice is to choose ct = c adjust accordingly) T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.9. LQ CONTROL PROBLEMS

260

, after which the constraint can be written as in As before we set yt = wt+1 + and ut := ct c (4.63) We saw how this constraint could be manipulated into the LQ formulation xt+1 = Axt + But + Cwt+1 by setting xt = (bt 1) and using the denitions in (4.65) To match with this state and control, the objective function (4.77) can be written in the form of (4.67) by choosing 0 0 q 0 Q := 1, R := , and R f := 0 0 0 0 Now that the problem is expressed in LQ form, we can proceed to the solution by applying (4.73) and (4.75) After generating shocks w1 , . . . , wT , the dynamics for assets and consumption can be simulated via (4.76) We provide code for all these operations below = 2, = The following gure was computed using this code, with r = 0.05, = 1/(1 + r ), c 1, = 0.25, T = 45 and q = 106 The shocks {wt } were taken to be iid and standard normal The top panel shows the time path of consumption ct and income yt in the simulation As anticipated by the discussion on consumption smoothing, the time path of consumption is much smoother than that for income (But note that consumption becomes more irregular towards the end of life, when the zero nal asset requirement impinges more on consumption choices) The second panel in the gure shows that the time path of assets bt is closely correlated with cumulative unanticipated income, where the latter is dened as zt :=

j =0

wt

A key message is that unanticipated windfall gains are saved rather than consumed, while unanticipated negative shocks are met by reducing assets (Again, this relationship breaks down towards the end of life due to the zero nal asset requirement) These results are relatively robust to changes in parameters For example, lets increase from 1/(1 + r ) 0.952 to 0.96 while keeping other parameters xed This consumer is slightly more patient than the last one, and hence puts relatively more weight on later consumption values A simulation is shown below We now have a slowly rising consumption stream and a hump-shaped build up of assets in the middle periods to fund rising consumption However, the essential features are the same: consumption is smooth relative to income, and assets are strongly positively correlated with cumulative unanticipated income T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.9. LQ CONTROL PROBLEMS

261

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

262

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

263

Extensions to the LQ Model


Lets now consider a number of standard extensions to the LQ problem treated above Nonstationary Parameters In some settings it can be desirable to allow A, B, C, R and Q to depend on t For the sake of simplicity, weve chosen not to treat this extension in our implementation given below However, the loss of generality is not as large as you might rst imagine In fact, we can tackle many nonstationary models from within our implementation by suitable choice of state variables One illustration is given below For further examples and a more systematic treatment, see [HansenSargent2013], section 2.4 Adding a Cross-Product Term In some LQ problems, preferences include a cross-product term ut Nxt , so that the objective function becomes E
T 1 t =0

t (xt Rxt + ut Qut + 2ut Nxt ) + T xT R f xT

(4.78)

Our results extend to this case in a straightforward way The sequence { Pt } from (4.73) becomes Pt1 := R ( B Pt A + N ) ( Q + B Pt B)1 ( B Pt A + N ) + A Pt A The policies in (4.75) are modied to ut = Ft xt where Ft := ( Q + B Pt+1 B)1 ( B Pt+1 A + N ) (4.80) with PT = R f (4.79)

The sequence {dt } is unchanged from (4.74) We leave interested readers to conrm these results (the calculations are long but not overly difcult) In the Python implementation below the cross-product term is omitted While this might appear to reduce generality, in fact certain tricks can be employed to solve LQ problems where N is nonzero using our implementation For details see [HansenSargent2008], chapter 4 Innite Horizon Finally, we consider the innite horizon case, with unchanged dynamics and objective function given by E
t =0

t (xt Rxt + ut Qut )

(4.81)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

264

In the innite horizon case, optimal policies can depend on time only if time itself is a component of the state vector xt In other words, there exists a xed matrix F such that ut = Fxt for all t This stationarity is intuitive after all, the decision maker faces the same innite horizon at every stage, with only the current state changing Not surprisingly, P and d are also constant The stationary matrix P is given by the xed point of (4.73) Equivalently, it is the solution P to the discrete time algebraic Riccati equation P = R 2 A PB( Q + B PB)1 B PA + A PA (4.82)

Equation (4.82) is also called the LQ Bellman equation, and the map that sends a given P into the right-hand side of (4.82) is caleld the LQ Bellman operator The stationary optimal policy for this model is u = Fx where F := ( Q + B PB)1 B PA (4.83)

The sequence {dt } is from (4.74) is replaced by the constant value d := trace(C PC ) An example innite horizon problem is treated below 1 (4.84)

Implementation
Lets now put together some code for solving nite and innite horizon linear quadratic control problems Our code is in le lqcontrol.py from the main repository Description and clarications are given below
import numpy as np from numpy import dot from scipy.linalg import solve import riccati class LQ: """ This class is for analyzing linear quadratic optimal control problems of either the infinite horizon form min E sum_{t=0}^{infty} beta^t r(x_t, u_t) with r(x_t, u_t) := x_t' R x_t + u_t' Q u_t

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

265

or the finite horizon form min E sum_{t=0}^{T-1} beta^t r(x_t, u_t) + x_T' R_f x_T Both are minimized subject to the law of motion x_{t+1} = A x_t + B u_t + C w_{t+1} Here x is n x 1, u is k x 1, w is j x 1 and the matrices are conformable for these dimensions. The sequence {w_t} is assumed to be white noise, with zero mean and E w_t w_t' = I, the j x j identity. If C is not supplied as a parameter, the model is assumed to be deterministic (and C is set to a zero matrix of appropriate dimension). For this model, the time t value (i.e., cost-to-go) function V_t takes the form x' P_T x + d_T and the optimal policy is of the form u_T = -F_T x_T. horizon case, V, P, d and F are all stationary. """ In the infinite

def __init__(self, Q, R, A, B, C=None, beta=1, T=None, Rf=None): """ Provides parameters describing the LQ model Parameters ============ * * * * * * R and Rf are n x n, symmetric and nonnegative definite Q is k x k, symmetric and positive definite A is n x n B is n x k C is n x j, or None for a deterministic model beta is a scalar in (0, 1] and T is an int

All arguments should be scalars or NumPy ndarrays. Here T is the time horizon. If T is not supplied, then the LQ problem is assumed to be infinite horizon. If T is supplied, then the terminal reward matrix Rf should also be specified. For interpretation of the other parameters, see the docstring of the LQ class. We also initialize the pair (P, d) that represents the value function via V(x) = x' P x + d, and the policy function matrix F. """ # == Make sure all matrices can be treated as 2D arrays == # converter = lambda X: np.atleast_2d(np.asarray(X, dtype='float32')) self.A, self.B, self.Q, self.R = map(converter, (A, B, Q, R)) # == Record dimensions == #

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

266

self.k, self.n = self.Q.shape[0], self.R.shape[0] self.beta = beta if C == None: # == If C not given, then model is deterministic. Set C=0. == # self.j = 1 self.C = np.zeros((self.n, self.j)) else: self.C = converter(C) self.j = self.C.shape[1] if T: # == Model is finite horizon == # self.T = T self.Rf = np.asarray(Rf, dtype='float32') self.P = self.Rf self.d = 0 else: self.P = None self.d = None self.T = None self.F = None def update_values(self): """ This method is for updating in the finite horizon case. current value function V_t(x) = x' P_t x + d_t and the optimal policy F_t one step *back* in time, replacing the pair P_t and d_t with P_{t-1} and d_{t-1}, and F_t with F_{t-1} """ # === Simplify notation === # Q, R, A, B, C = self.Q, self.R, self.A, self.B, self.C P, d = self.P, self.d # == Some useful matrices == # S1 = Q + self.beta * dot(B.T, dot(P, B)) S2 = self.beta * dot(B.T, dot(P, A)) S3 = self.beta * dot(A.T, dot(P, A)) # == Compute F as (Q + B'PB)^{-1} (beta B'PA) == # self.F = solve(S1, S2) # === Shift P back in time one step == # new_P = R - dot(S2.T, solve(S1, S2)) + S3 # == Recalling that trace(AB) = trace(BA) == # new_d = self.beta * (d + np.trace(dot(P, dot(C, C.T)))) # == Set new state == # self.P, self.d = new_P, new_d def stationary_values(self): """

It shifts the

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

267

Computes the matrix P and scalar d that represent the value function V(x) = x' P x + d Also computes the control matrix F from u = - Fx """ # === simplify notation === # Q, R, A, B, C = self.Q, self.R, self.A, self.B, self.C # === solve Riccati equation, obtain P === # A0, B0 = np.sqrt(self.beta) * A, np.sqrt(self.beta) * B P = riccati.dare(A0, B0, Q, R) # == Compute F == # S1 = Q + self.beta * dot(B.T, dot(P, B)) S2 = self.beta * dot(B.T, dot(P, A)) F = solve(S1, S2) # == Compute d == # d = self.beta * np.trace(dot(P, dot(C, C.T))) / (1 - self.beta) # == Bind states and return values == # self.P, self.F, self.d = P, F, d return P, F, d def compute_sequence(self, x0, ts_length=None): """ Compute and return the optimal state and control sequences x_0,..., x_T and u_0,..., u_T under the assumption that {w_t} is iid and N(0, 1). Parameters =========== x0 : numpy.ndarray The initial state, a vector of length n ts_length : int Length of the simulation -- defaults to T in finite case Returns ======== x_path : numpy.ndarray An n x T matrix, where the t-th column represents x_t u_path : numpy.ndarray A k x T matrix, where the t-th column represents u_t """ # === Simplify notation === # Q, R, A, B, C = self.Q, self.R, self.A, self.B, self.C # == Preliminaries, finite horizon case == # if self.T: T = self.T if not ts_length else min(ts_length, self.T) self.P, self.d = self.Rf, 0 # == Preliminaries, infinite horizon case == # else:

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

268

T = ts_length if ts_length else 100 self.stationary_values() # == Set up initial condition and arrays to store paths == # x0 = np.asarray(x0) x0 = x0.reshape(self.n, 1) # Make sure x0 is a column vector x_path = np.empty((self.n, T+1)) u_path = np.empty((self.k, T)) w_path = dot(C, np.random.randn(self.j, T+1)) # == Compute and record the sequence of policies == # policies = [] for t in range(T): if self.T: # Finite horizon case self.update_values() policies.append(self.F) # == Use policy sequence to generate states and controls == # F = policies.pop() x_path[:, 0] = x0.flatten() u_path[:, 0] = - dot(F, x0).flatten() for t in range(1, T): F = policies.pop() Ax, Bu = dot(A, x_path[:, t-1]), dot(B, u_path[:, t-1]) x_path[:, t] = Ax + Bu + w_path[:, t] u_path[:, t] = - dot(F, x_path[:, t]) Ax, Bu = dot(A, x_path[:, T-1]), dot(B, u_path[:, T-1]) x_path[:, T] = Ax + Bu + w_path[:, T] return x_path, u_path, w_path

In the module, the various updating, simulation and xed point methods are wrapped in a class called LQ, which includes Instance data: The required parameters Q, R, A, B and optional parameters C, beta, T, R_f, N specifying a given LQ model * set T and R f to None in the innite horizon case * set C = None (or zero) in the deterministic case the value function and policy data * dt , Pt , Ft in the nite horizon case * d, P, F in the innite horizon case Methods: update_values shifts dt , Pt , Ft to their t 1 values via (4.73), (4.74) and (4.75) stationary_values computes P, d, F in the innite horizon case compute_sequence - simulates the dynamics of xt , ut , wt given x0 and assuming standard normal shocks An example of usage is given in lq_permanent_1.py from the main repository, the contents of which are shown below

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

269

This program can be used to replicate the gures shown in our section on the permanent income model (Some of the plotting techniques are rather fancy and you can ignore those details if you wish)
import numpy as np import matplotlib.pyplot as plt from lqcontrol import * # == Model parameters == # r = 0.05 beta = 1 / (1 + r) T = 45 c_bar = 2 sigma = 0.25 mu = 1 q = 1e6 # == Formulate as an LQ problem == # Q = 1 R = np.zeros((2, 2)) Rf = np.zeros((2, 2)) Rf[0, 0] = q A = [[1 + r, -c_bar + mu], [0, 1]] B = [[-1], [0]] C = [[sigma], [0]] # == Compute solutions and simulate == # lq = LQ(Q, R, A, B, C, beta=beta, T=T, Rf=Rf) x0 = (0, 1) xp, up, wp = lq.compute_sequence(x0) # == Convert back to assets, consumption and income == # assets = xp[0, :] # b_t c = up.flatten() + c_bar # c_t income = wp[0, 1:] + mu # y_t # == Plot results == # n_rows = 2 fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10)) plt.subplots_adjust(hspace=0.5) for i in range(n_rows): axes[i].grid() axes[i].set_xlabel(r'Time') bbox = (0., 1.02, 1., .102) legend_args = {'bbox_to_anchor' : bbox, 'loc' : 3, 'mode' : 'expand'} p_args = {'lw' : 2, 'alpha' : 0.7} axes[0].plot(range(1, T+1), income, 'g-', label="non-financial income", **p_args) axes[0].plot(range(T), c, 'k-', label="consumption", **p_args)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

270

axes[0].legend(ncol=2, **legend_args)

axes[1].plot(range(1, T+1), np.cumsum(income - mu), 'r-', label="cumulative unanticipated income", **p_a axes[1].plot(range(T+1), assets, 'b-', label="assets", **p_args) axes[1].plot(range(T), np.zeros(T), 'k-') axes[1].legend(ncol=2, **legend_args) plt.show()

Further Applications
Application 1: Nonstationary Income Previously we studied a permanent income model that generated consumption smoothing One unrealistic feature of that model is the assumption that the mean of the random income process does not depend on the consumers age A more realistic income prole is one that rises in early working life, peaks towards the middle and maybe declines toward end of working life, and falls more during retirement In this section, we will model this rise and fall as a symmetric inverted U using a polynomial in age As before, the consumer seeks to minimize E
T 1 t =0

)2 + T qb2 t (ct c T

(4.85)

subject to bt+1 = (1 + r )bt ct + yt , t 0 For income we now take yt = p(t) + wt+1 where p(t) := m0 + m1 t + m2 t2 (In the next section we employ some tricks to implement a more sophisticated model) The coefcients m0 , m1 , m2 are chosen such that p(0) = 0, p( T /2) = , and p( T ) = 0 You can conrm that the specication m0 = 0, m1 = T /( T /2)2 , m2 = /( T /2)2 satises these constraints To put this into an LQ setting, consider the budget constraint, which becomes + m 1 t + m 2 t 2 + w t +1 bt + 1 = ( 1 + r ) bt u t c (4.86)

The fact that bt+1 is a linear function of (bt , 1, t, t2 ) suggests taking these four variables as the state vector xt ) has been made, the remaining speciOnce a good choice of state and control (recall ut = ct c cations fall into place relatively easily Thus, for the dynamics we set bt 1 + r c m m 2 1 1 0 1 0 0 , xt := t , A := 0 1 2 1 t2 T HOMAS S ARGENT AND J OHN S TACHURSKI

1 0 B := 0 , 0

0 C := 0 0

(4.87)

February 5, 2014

4.9. LQ CONTROL PROBLEMS

271

If you expand the expression xt+1 = Axt + But + Cwt+1 using this specication, you will nd that assets follow (4.86) as desired, and that the other state variables also update appropriately To implement preference specication (4.85) we take 0 0 R := 0 0 0 0 0 0 0 0 0 0 0 0 0 0 q 0 R f := 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Q := 1,

and

(4.88)

The next gure shows a simulation of consumption and assets computed using the compute_sequence method of lqcontrol.py with initial assets set to zero

Once again, smooth consumption is a dominant feature of the sample paths The asset path exhibits dynamics consistent with standard life cycle theory Exercise 1 gives the full set of parameters used here and asks you to replicate the gure

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

272

Application 2: A Permanent Income Model with Retirement In the previous application, we generated income dynamics with an inverted U shape using polynomials, and placed them in an LQ framework It is arguably the case that this income process still contains unrealistic features A more common earning prole is where 1. income grows over working life, uctuating around an increasing trend, with growth attening off in later years 2. retirement follows, with lower but relatively stable (non-nancial) income Letting K be the retirement date, we can express these income dynamics by yt = Here p(t) := m1 t + m2 t2 with the coefcients m1 , m2 chosen such that p(K ) = and p(0) = p (2K ) = 0 s is retirement income We suppose that preferences are unchanged and given by (4.77) The budget constraint is also unchanged and given by bt+1 = (1 + r )bt ct + yt Our aim is to solve this problem and simulate paths using the LQ techniques described in this lecture In fact this is a nontrivial problem, as the kink in the dynamics (4.89) at K makes it very difcult to express the law of motion as a xed-coefcient linear system However, we can still use our LQ methods here by suitably linking two component LQ problems These two LQ problems describe the consumers behavior during her working life (lq_working) and retirement (lq_retired) (This is possible because in the two separate periods of life, the respective income processes [polynomial trend and constant] each t the LQ framework) The basic idea is that although the whole problem is not a single time-invariant LQ problem, it is still a dynamic programming problem, and hence we can use appropriate Bellman equations at every stage Based on this logic, we can 1. solve lq_retired by the usual backwards induction procedure, iterating back to the start of retirement 2. take the start-of-retirement value function generated by this process, and use it as the terminal condition R f to feed into the lq_working specication 3. solve lq_working by backwards induction from this choice of R f , iterating back to the start of working life T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014 p ( t ) + w t +1 s if t K otherwise (4.89)

4.9. LQ CONTROL PROBLEMS

273

This process gives the entire life-time sequence of value functions and optimal policies The next gure shows one simulation based on this procedure

The full set of parameters used in the simulation is discussed in Exercise 2, where you are asked to replicate the gure Once again, the dominant feature observable in the simulation is consumption smoothing The asset path ts well with standard life cycle theory, with dissaving early in life followed by later saving Assets peak at retirement and subsequently decline Application 3: Monopoly with Adjustment Costs Consider a monopolist facing stochastic inverse demand function p t = a0 a1 q t + d t Here qt is output, and the demand shock dt follows d t +1 = d t + w t +1 T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.9. LQ CONTROL PROBLEMS

274

where {wt } is iid and standard normal The monopolist maximizes the expected discounted sum of present and future prots E

t =0

t t

where

t := pt qt cqt (qt+1 qt )2

(4.90)

The term (qt+1 qt )2 represents adjustment costs This can be formulated as an LQ problem and then solved and simulated, but rst lets study the problem and try to get some intuition One way to start thinking about the problem is to consider what would happen if = 0 Without adjustment costs there is no intertemporal trade-off, so the monopolist will choose output to maximize current prot in each period Its not difcult to show that prot-maximizing output is t : = q a0 c + d t 2 a1

In light of this discussion, what we might expect for general is that t relatively closely if is close to zero, then qt will track the time path of q t , as the monopolist seeks to avoid adjustment if is larger, then qt will be smoother than q costs This intuition turns out to be correct The following gures show simulations produced by solving the corresponding LQ problem The only difference in parameters across the gures is the size of

To produce these gures we converted the monopolist problem into an LQ problem The key to this conversion is to choose the right state which can be a bit of an art t qt 1) , while the control is chosen as ut = qt+1 qt Here we take xt = (q We also manipulated the prot function slightly In (4.90), current prots are t := pt qt cqt (qt+1 qt )2 t : = t a1 q 2 Lets now replace t in (4.90) with t 2 This makes no difference to the solution, since a1 q t does not depend on the controls (In fact we are just adding a constant term to (4.90), and optimizers are not affected by constant terms)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

275

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.9. LQ CONTROL PROBLEMS

276

t reduces to the The reason for making this substitution is that, as you will be able to verify, simple quadratic t = a1 ( q t q t )2 u2 t After negation to convert to a minimization problem, the objective becomes min E

t =0

t )2 + u2 a1 ( q t q t

(4.91)

Its now relatively straightforward to nd R and Q such that (4.91) can be written as (4.81) Furthermore, the matrices A, B and C from (4.62) can be found by writing down the dynamics of each element of the state Exercise 3 asks you to complete this process, and reproduce the preceding gures

Exercises
Exercise 1 Replicate the gure with polynomial income shown above = 1.5, = 2, = 0.15, T = 50 and q = 104 The parameters are r = 0.05, = 1/(1 + r ), c Solution: View solution Exercise 2 Replicate the gure on work and retirement shown above = 4, = 4, = 0.35, K = 20, T = 60, s = 1 and The parameters are r = 0.05, = 1/(1 + r ), c q = 104 To understand the overall procedure, carefully read the section containing that gure Some hints are as follows: First, in order to make our approach work, we must ensure that both LQ problems have the same state variables and control As with previous applications, the control can be set to ut = ct c For lq_working, xt , A, B, C can be chosen as in (4.87) Recall that m1 , m2 are chosen so that p(K ) = and p(2/K ) = 0 For lq_retired, use the same denition of xt and ut , but modify A, B, C to correspond to constant income yt = s For lq_retired, set preferences as in (4.88) For lq_working, preferences are the same, except that R f should be replaced by the nal value function that emerges from iterating lq_retired back to the start of retirement With some careful footwork, the simulation can be generated by patching together the simulations from these two separate models Solution: View solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.10. RATIONAL EXPECTATIONS EQUILIBRIUM

277

Exercise 3 Reproduce the gures from the monopolist application given above For parameters, use a0 = 5, a1 = 0.5, = 0.15, = 0.9, = 0.95 and c = 2, while varies between 1 and 50 (see gures) Solution: View solution

4.10 Rational Expectations Equilibrium


If youre so smart, why arent you rich?

Overview
This lecture introduces the concept of rational expectations equilibrium To illustrate it, we describe a linear quadratic version of a famous and important model due to Lucas and Prescott [LucasPrescott1971] This 1971 paper is one of a small number of research articles that kicked off the rational expectations revolution We follow Lucas and Prescott by employing a setting that is readily Bellmanized (i.e., capable of being formulated in terms of dynamic programming problems) Because we use linear quadratic setups for demand and costs, we can adapt the LQ programming techniques described in this lecture We will learn about how a representative agents problem differs from a planners, and how a planning problem can be used to compute rational expectations quantities We will also learn about how a rational expectations equilibrium can be characterized as a xed point of a mapping from a perceived law of motion to an actual law of motion Equality between a perceived and an actual law of motion for endogenous market-wide objects captures in a nutshell what the rational expectations equilibrium concept is all about Finally, we will learn about the important Big K, little k trick, a modeling device widely used in macroeconomics Except that for us Instead of Big K it will be Big Y instead of little k it will be little y The Big Y , little y trick This widely used method applies in contexts in which a representative rm or agent is a price taker operating within a competitive equilibrium We want to impose that The representative rm or individual takes aggregate Y as given when it chooses individual y, but . . .

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.10. RATIONAL EXPECTATIONS EQUILIBRIUM

278

At the end of the day, Y = y, so that the representative rm is indeed representative The Big Y , little y trick accomplishes these two goals by Taking Y as a given state variable or process, beyond the control of the representative individual, when posing the problem of the individual rm or agent; but . . . Imposing Y = y after having solved the individuals optimization problem Please watch for how this strategy is applied as the lecture unfolds Further Reading References for this lecture include [LucasPrescott1971] [Sargent1987], chapter XIV RMT3, chapter 7

Dening Rational Expectations Equilibrium


Our rst illustration of rational expectations equilibrium involves a market with n rms, each of whom seeks to maximize prots in the face of adjustment costs The adjustment costs encourage the rms to make gradual adjustments, which in turn requires consideration of future prices Individual rms understand that prices are determined by aggregate supply from other rms, and hence each rm must forecast this quantity In our context, a forecast is expressed as a belief about the law of motion for the aggregate state Rational expectations equilibrium is obtained when this belief coincides with the actual law of motion generated by production choices made on the basis of this belief Competitive Equilibrium with Adjustment Costs To illustrate, consider a collection of n rms producing a homogeneous good that is sold in a competitive market. Each of these n rms sells output yt The price pt of the good lies on the inverse demand curve pt = a0 a1 Yt where ai > 0 for i = 0, 1 Yt = nyt is the market-wide level of output (4.92)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.10. RATIONAL EXPECTATIONS EQUILIBRIUM

279

The Firms Problem The rm is a price taker While it faces no uncertainty, it does face adjustment costs In particular, it chooses a production plan to maximize

t =0

t rt
y0 given

(4.93)

where rt := pt yt Regarding the parameters, (0, 1) is a discount factor

( y t +1 y t )2 , 2

(4.94)

> 0 measures the cost of adjusting the rate of output Regarding timing, the rm observes pt and yt at time t when it chooses yt+1 To state the rms optimization problem completely requires that we specify dynamics for all state variables This includes ones like pt , which the rm cares about but does not control We turn to this problem now Prices and Aggregate Output In view of (4.92), the rms incentive to forecast the market price translates into an incentive to forecast the level of aggregate output Yt Aggregate output depends on the choices of other rms We assume that n is a large number so that the output of any single rm has a negligible effect on aggregate output That justies rms in treating their forecast of aggregate output as being unaffected by their own output decisions The Firms Beliefs We suppose the rm believes that market-wide output Yt follows the law of motion Yt+1 = H (Yt ) (4.95) where Y0 is a known initial condition The belief function H is an equilibrium object, and hence remains to be determined Optimal Behavior Given Beliefs For now lets x a particular belief H in (4.95) and investigate the rms response Let v be the corresponding value function for the rms problem The value function satises the Bellman equation v(y, Y ) = max a0 y a1 yY
y

( y y )2 + v(y , H (Y )) 2

(4.96) February 5, 2014

T HOMAS S ARGENT AND J OHN S TACHURSKI

4.10. RATIONAL EXPECTATIONS EQUILIBRIUM

280

Lets denote the rms optimal policy function by h, so that yt+1 = h(yt , Yt ) where h(y, Y ) := arg max a0 y a1 yY
y

(4.97)

( y y )2 + v(y , H (Y )) 2

(4.98)

Evidently v and h both depend on H First Order Characterization of h In what follows it will be helpful to have a second characterization of h, based on rst order conditions The rst-order necessary condition for choosing y is

(y y) + vy (y , H (Y )) = 0

(4.99)

A well-known envelope result [BenvenisteScheinkman1979] implies that to differentiate v with respect to y we can naively differentiate the right-hand side of (4.96), giving v y ( y, Y ) = a0 a1 Y + ( y y ) Substituting this equation into (4.99) gives the Euler equation

(yt+1 yt ) + [ a0 a1 Yt+1 + (yt+2 yt+1 )] = 0

(4.100)

In the process of solving its Bellman equation, the rm sets an output path that satises (4.100), taking (4.95) as given, and subject to the initial conditions for (y0 , Y0 ) the terminal condition limt t yt vy (yt , Yt ) = 0 This last condition is called the transversality condition, and acts as a rst-order necessary condition at innity The rms decision rule solves the difference equation (4.100) subject to the given initial condition y0 and the transversality condition Note that solving the Bellman equation (4.96) for v and then h in (4.98) yields a decision rule that automatically imposes both the Euler equation (4.100) and the transversality condition The Actual Law of Motion for {Yt } decision rule h As weve seen, a given belief translates into a particular

Recalling that Yt = nyt , the actual law of motion for market-wide output is then Yt+1 = nh(Yt /n, Yt ) (4.101)

Thus, when rms believe that the law of motion for market-wide output is (4.95), their optimizing behavior makes the actual law of motion be (4.101)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.10. RATIONAL EXPECTATIONS EQUILIBRIUM

281

Denition of Rational Expectations Equilibrium A rational expectations equilibrium or recursive competitive equilibrium of the model with adjustment costs is a decision rule h and an aggregate law of motion H such that 1. Given belief H , the map h is the rms optimal policy function 2. The law of motion H satises H (Y ) = nh(Y /n, Y ) for all Y Thus, a rational expectations equilibrium equates the actual and perceived laws of motion (4.95) and (4.101) Fixed point characterization As weve seen, the rms optimum problem induces a mapping from a perceived law of motion H for market-wide output to an actual law of motion ( H ) The mapping is the composition of two operations, taking a perceived law of motion into a decision rule via (4.96)(4.98), and a decision rule into an actual law via (4.101) The H component of a rational expectations equilibrium is a xed point of

Computation of the Equilibrium


Now lets consider the problem of computing the rational expectations equilibrium Misbehavior of Readers accustomed to dynamic programming arguments might try to address this problem by choosing some guess H0 for the aggregate law of motion and then iterating with Unfortunately, the mapping is not a contraction In particular, there is no guarantee that direct iterations on converge 7 Fortunately, there is another method that works here The method exploits a general connection between equilibrium and Pareto optimality expressed in the fundamental theorems of welfare economics (see, e.g, [MCWG1995]) Lucas and Prescott [LucasPrescott1971] used this method to construct a rational expectations equilibrium The details follow A Planning Problem Approach Our plan of attack is to match the Euler equations of the market problem with those for a a single-agent planning problem As well see, this planning problem can be solved by LQ control
7 A literature that studies whether models populated with agents who learn can converge to rational expectations equilibria features iterations on a modication of the mapping that can be approximated as + (1 ) I . Here I is the identity operator and (0, 1) is a relaxation parameter. See [MarcetSargent1989] and [EvansHonkapohja2001] for statements and applications of this approach to establish conditions under which collections of adaptive agents who use least squares learning converge to a rational expectations equilibrium.

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.10. RATIONAL EXPECTATIONS EQUILIBRIUM

282

The optimal quantities from the planning problem are then rational expectations equilibrium quantities The rational expectations equilibrium price can be obtained as a shadow price in the planning problem For convenience, in this section we set n = 1 We rst compute a sum of consumer and producer surplus at time t s(Yt , Yt+1 ) :=
Yt 0

( a0 a1 x ) dx

(Yt+1 Yt )2 2

(4.102)

The rst term is the area under the demand curve, while the second is the social costs of changing output The planning problem is to choose a production plan {Yt } to maximize

t =0

t s(Yt , Yt+1 )

subject to an initial condition for Y0 Solution of the Planning Problem Evaluating the integral in (4.102) yields the quadratic form a0 Yt a1 Yt2 /2 As a result, the Bellman equation for the planning problem is V (Y ) = max a0 Y
Y

a 1 2 (Y Y ) 2 Y + V (Y ) 2 2

(4.103)

The associated rst order condition is

(Y Y ) + V (Y ) = 0
Applying the same Benveniste-Scheinkman formula gives V (Y ) = a 0 a 1 Y + (Y Y ) Substituting this into equation (4.104) and rearranging leads to the Euler equation a0 + Yt [ a1 + (1 + )]Yt+1 + Yt+2 = 0 The Key Insight Return to equation (4.100) and set yt = Yt for all t (Recall that for this section weve set n = 1 to simplify the calculations)

(4.104)

(4.105)

A small amount of algebra will convince you that when yt = Yt , equations (4.105) and (4.100) are identical Thus, the Euler equation for the planning problem matches the second-order difference equation that we derived by 1. nding the Euler equation of the representative rm and T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

4.10. RATIONAL EXPECTATIONS EQUILIBRIUM

283

2. substituting into it the expression Yt = nyt that makes the representative rm be representative If it is appropriate to apply the same terminal conditions for these two difference equations, which it is, then we have veried that a solution of the planning problem also is a rational expectations equilibrium It follows that for this example we can compute an equilibrium by forming the optimal linear regulator problem corresponding to the Bellman equation (4.103) The optimal policy function for the planning problem is the aggregate law of motion H that the representative rm faces within a rational expectations equilibrium. Structure of the Law of Motion As you are asked to show in the exercises, the fact that the planners problem is an LQ problem implies an optimal policy and hence aggregate law of motion taking the form Yt+1 = 0 + 1 Yt (4.106) for some parameter pair 0 , 1 Now that we know the aggregate law of motion is linear, we can see from the rms Bellman equation (4.96) that the rms problem can be framed as an LQ problem As youre asked to show in the exercises, the LQ formulation of the rms problem implies a law of motion that looks as follows yt+1 = h0 + h1 yt + h2 Yt (4.107) Hence a rational expectations equilibrium will be dened by the parameters (0 , 1 , h0 , h1 , h2 ) in (4.106)(4.107)

Exercises
Exercise 1 Consider the rm problem described above Let the rms belief function H be as given in (4.106) Formulate the rms problem as a discounted optimal linear regulator problem, being careful to describe all of the objects needed Use the program lqcontrol.py from the main repository to solve the rms problem for the following parameter values: a0 = 100, a1 = 0.05, = 0.95, = 10, 0 = 95.5, 1 = 0.95 Express the solution of the rms problem in the form (4.107) and give the values for each h j If there were n identical competitive rms all behaving according to (4.107), what would (4.107) imply for the actual law of motion (4.95) for market supply Solution: View solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

4.10. RATIONAL EXPECTATIONS EQUILIBRIUM

284

Exercise 2 Consider the following 0 , 1 pairs as candidates for the aggregate law of motion component of a rational expectations equilibrium (see (4.106)) Extending the program that you wrote for exercise 1, determine which if any satisfy the denition of a rational expectations equilibrium (94.0886298678, 0.923409232937) (93.2119845412, 0.984323478873) (95.0818452486, 0.952459076301) Describe an iterative algorithm that uses the program that you wrote for exercise 1 to compute a rational expectations equilibrium (You are not being asked actually to use the algorithm you are suggesting) Solution: View solution Exercise 3 Recall the planners problem described above 1. Formulate and the planners problem as an LQ problem 2. Solve it using the same parameter values in exercise 1 a0 = 100, a1 = 0.05, = 0.95, = 10 3. Represent the solution in the form Yt+1 = 0 + 1 Yt 4. Compare your answer with the results from exercise 2 Solution: View solution Exercise 4 A monopolist faces the industry demand curve (4.92) and chooses {Yt } to maximize t t=0 rt where (Yt+1 Yt )2 rt = pt Yt 2 Formulate this problem as an LQ problem Compute the optimal policy using the same parameters as the previous exercise In particular, solve for the parameters in Yt+1 = m0 + m1 Yt Compare your results with the previous exercise. Comment. Solution: View solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

CHAPTER

FIVE

ADVANCED APPLICATIONS
This advanced section of the course contains more complex applications, and can be read selectively, according to your interests

5.1 Continuous State Markov Chains


Overview
In a previous lecture we learned about nite Markov chains, a relatively elementary class of stochastic dynamic models The present lecture extends this analysis to continuous (i.e., uncountable) state Markov chains Most stochastic dynamic models studied by economists either t directly into this class or can be represented as continuous state Markov chains after minor modications In this lecture, our focus will be on continuous Markov models that evolve in discrete time are often nonlinear The fact that we accommodate nonlinear models here is signicant, because linear stochastic models have their own highly developed tool set, as well see later on The question that interests us most is: Given a particular stochastic dynamic model, how will the state of the system evolve over time? In particular, What happens to the distribution of the state variables? Is there anything we can say about the average behavior of these variables? Is there a notion of steady state or long run equilibrium thats applicable to the model? If so, how can we compute it? Answering these questions will lead us to revisit many of the topics that occupied us in the nite state case, such as simulation, distribution dynamics, stability, ergodicity, etc.

285

5.1. CONTINUOUS STATE MARKOV CHAINS

286

Note: For some people, the term Markov chain always refers to a process with a nite or discrete state space. We follow the mainstream mathematical literature (e.g., [MeynTweedie2009]) in using the term to refer to any discrete time Markov process

The Density Case


You are probably aware that some distributions can be represented by densities and some cannot (For example, distributions on the real numbers R that put positive probability on individual points have no density representation) We are going to start our analysis by looking at Markov chains where the one step transition probabilities have density representations The benet is that the density case offers a very direct parallel to the nite case in terms of notation and intuition Once weve built some intuition well cover the general case Denitions and Basic Properties In our lecture on nite Markov chains,, we studied discrete time Markov chains that evolve on a nite state space S In this setting, the dynamics of the model are described by a stochastic matrix a nonnegative square matrix P = P[i, j] such that each row P[i, ] sums to one The interpretation of P is that P[i, j] represents the probability of transitioning from state i to state j in one unit of time In symbols, P { X t +1 = j | X t = i } = P [ i , j ] Equivalently, P can be thought of as a family of distributions P[i, ], one for each i S P[i, ] is the distribution of Xt+1 given Xt = i (As you probably recall, when using NumPy arrays, P[i, ] is expressed as P[i,:]) In this section, well allow S to be a subset of R, such as R itself the positive reals (0, ) a bounded interval ( a, b) The family of discrete distributions P[i, ] will be replaced by a family of densities p( x, ), one for each x S Analogous to the nite state case, p( x, ) is to be understood as the distribution (density) of Xt+1 given Xt = x More formally, a stochastic kernel on S is a function p : S S R with the property that

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.1. CONTINUOUS STATE MARKOV CHAINS

287

1. p( x, y) 0 for all x, y S 2. p( x, y)dy = 1 for all x S

(Integrals are over the whole space unless otherwise specied) For example, let S = R and consider the particular stochastic kernel pw dened by 1 ( y x )2 pw ( x, y) := exp 2 2 What kind of model does pw represent? The answer is, the (normally distributed) random walk X t +1 = X t + t +1 where (5.1)

{ t } N (0, 1)

IID

(5.2)

To see this, lets nd the stochastic kernel p corresponding to (5.2) Recall that p( x, ) represents the distribution of Xt+1 given Xt = x Letting Xt = x in (5.2) and considering the distribution of Xt+1 , we see that p( x, ) = N ( x, 1) In other words, p is exactly pw , as dened in (5.1) Connection to Stochastic Difference Equations In the previous section, we made the connection between stochastic difference equation (5.2) and stochastic kernel (5.1) In economics and time series analysis we meet stochastic difference equations of all different shapes and sizes It will be useful for us if we have some systematic methods for converting stochastic difference equations into stochastic kernels To this end, consider the generic (scalar) stochastic difference equation given by X t +1 = ( X t ) + ( X t ) t +1 Here we assume that { t } , where is a given density on R and are given functions on S, with ( x ) > 0 for all x Example 1: The random walk (5.2) is a special case of (5.3), with ( x ) = x and ( x ) = 1 Example 2: Consider the ARCH model Xt+1 = Xt + t t+1 , Alternatively, we can write the model as
2 1/2 X t +1 = X t + ( + X t ) t +1 2 t2 = + Xt , IID

(5.3)

, > 0

(5.4)

This is a special case of (5.3) with ( x ) = and ( x ) = ( + x2 )1/2 Example 3: With stochastic production and a constant savings rate, the one-sector neoclassical growth model leads to a law of motion for capital per worker such as k t+1 = sAt+1 f (k t ) + (1 )k t Here T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014 (5.5)

5.1. CONTINUOUS STATE MARKOV CHAINS

288

s is the rate of savings At+1 is a production shock is a depreciation rate f : R+ R+ is a production function satisfying f (k ) > 0 whenever k > 0 (The xed savings rate can be rationalized as the optimal policy for a particular set of technologies and preferences (see RMT3, section 3.1.2), although we omit the details here) Equation (5.5) is a special case of (5.3) with ( x ) = (1 ) x and ( x ) = s f ( x ) Now lets obtain the stochastic kernel corresponding to the generic model (5.3) To nd it, note rst that if U is a random variable with density f U , and V = a + bU for some constants a, b with b > 0, then the density of V is given by f V (v) = 1 fU b va b (5.6)

(The proof is below. For a multidimensional version see EDTC, theorem 8.1.3) Taking (5.6) as given for the moment, we can obtain the stochastic kernel p for (5.3) by recalling that p( x, ) is the conditional density of Xt+1 given Xt = x In the present case, this is equivalent to stating that p( x, ) is the density of Y := ( x ) + ( x ) t+1 when t+1 Hence, by (5.6), p( x, y) = 1 ( x) y ( x ) ( x) (5.7)

For example, the growth model in (5.5) has stochastic kernel p( x, y) = where is the density of At (Regarding the state space S for this model, a natural choice is (0, ) in which case ( x ) = s f ( x ) is strictly positive for all s as required) Distribution Dynamics In this section of our lecture on nite Markov chains, we asked the following question: If 1. { Xt } is a Markov chain with stochastic matrix P 2. the distribution of Xt is known to be t then what is the distribution of Xt+1 ? Letting t+1 denote the distribution of Xt+1 , the answer we gave was that t+1 [ j] =
i S

1 s f (x)

y (1 ) x s f (x)

(5.8)

P[i, j]t [i]


February 5, 2014

T HOMAS S ARGENT AND J OHN S TACHURSKI

5.1. CONTINUOUS STATE MARKOV CHAINS

289

This intuitive equality states that the probability of being at j tomorrow is the probability of visiting i today and then going on to j, summed over all possible i In the density case, we just replace the sum with an integral and probability mass functions with densities, yielding t+1 (y) = p( x, y)t ( x ) dx,

y S

(5.9)

It is convenient to think of this updating process in terms of an operator (An operator is just a function, but the term is usually reserved for a function that sends functions into functions) Let D be the set of all densities on S, and let P be the operator from D to itself that takes density and sends it into new density P, where the latter is dened by

( P)(y) =

p( x, y)( x )dx

(5.10)

This operator is usually called the Markov operator corresponding to p Note: Unlike most operators, we write P to the right of its argument, instead of to the left (i.e., P instead of P). This is a common convention, with the intention being to maintain the parallel with the nite case see here With this notation, we can write (5.9) more succinctly as t+1 (y) = (t P)(y) for all y, or, dropping the y and letting = indicate equality of functions, t+1 = t P (5.11)

Equation (5.11) tells us that if we specify a distribution for 0 , then the entire sequence of future distributions can be obtained by iterating with P Its interesting to note that (5.11) is a deterministic difference equation Thus, by converting a stochastic difference equation such as (5.3) into a stochastic kernel p and hence an operator P, we convert a stochastic difference equation into a deterministic one (albeit in a much higher dimensional space) Note: Some people might be aware that discrete Markov chains are in fact a special case of the continuous Markov chains we have just described. The reason is that probability mass functions are densities with respect to the counting measure.

Computation To learn about the dynamics of a given process, its very useful to compute and study the sequences of densities generated by the model One way to do this is to try to implement the iteration described by (5.11) using numerical integration If you actually try to do this, you will soon realize what a difcult and computationally intensive problem this is T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.1. CONTINUOUS STATE MARKOV CHAINS

290

Another possibility is to discretize the model, but this introduces errors of unknown size Often discretization is unnecessary and inefcient, since the densities can be efciently computed by combining simulation and an elegant estimator called the look ahead estimator Lets go over the ideas with reference to the growth model discussed above, the dynamics of which we repeat here for convenience: k t+1 = sAt+1 f (k t ) + (1 )k t (5.12)

Our aim is to compute the sequence {t } associated with this model and xed initial condition 0 To approximate t by simulation, recall that, by denition, t is the density of k t given k0 0 If we wish to generate observations of this random variable, all we need to do is 1. draw k0 from the specied initial condition 0 2. draw the shocks A1 , . . . , At from their specied density 3. compute k t iteratively via (5.12)
n If we repeat this n times, we get n independent observations k1 t , . . . , kt

With these draws in hand, the next step is to generate some kind of representation of their distribution t A naive approach would be to use a histogram, or perhaps a smoothed histogram using SciPys gaussian_kde function However, in the present setting there is a much better way to do this, based on the look-ahead estimator With this estimator, to construct an estimate of t , we actually generate n observations of k t1 , rather than k t
n Now we take these n observations k1 t1 , . . . , k t1 and form the estimate n t (y) =

1 n p(kit1 , y) n i =1

(5.13)

where p is the growth model stochastic kernel in (5.8) What is the justication for this slightly surprising estimator? The idea is that, by the strong law of large numbers, 1 n p(kit1 , y) E p(kit1 , y) = n i =1 with probability one as n Here the rst equality is by the denition of t1 , and the second is by (5.9)
n ( y ) in (5.13) converges almost surely to ( y ), which is We have just shown that our estimator t t just what we want to compute

p( x, y)t1 ( x ) dx = t (y)

In fact much stronger convergence results are true (see, for example, this paper) T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.1. CONTINUOUS STATE MARKOV CHAINS

291

Implementation Heres some code for estimation by this technique The le is lae.py, and can be found in the main repository
import numpy as np from scipy.stats import lognorm, beta import matplotlib.pyplot as plt class lae: """ An instance is a representation of a look ahead estimator associated with a given stochastic kernel p and a vector of observations X. For example, >>> psi = lae(p, X) >>> y = np.linspace(0, 1, 100) >>> psi(y) # Evaluate look ahead estimate at grid of points y """ def __init__(self, p, X): """ Parameters ========== p : function The stochastic kernel. both x and y

A function p(x, y) that is vectorized in

X : array_like A vector containing observations """ X = X.flatten() # So we know what we're dealing with n = len(X) self.p, self.X = p, X.reshape((n, 1)) def __call__(self, y): """ Parameters ========== y : array_like A vector of points at which we wish to evaluate the look-ahead estimator Returns ======= psi_vals : numpy.ndarray The values of the density estimate at the points in y """ k = len(y) v = self.p(self.X, y.reshape((1, k))) psi_vals = np.mean(v, axis=0) # Take mean along each row return psi_vals.flatten()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.1. CONTINUOUS STATE MARKOV CHAINS

292

# == An Example: Stochastic growth with Cobb-Douglas production == # if __name__ == '__main__': # If run directly, not imported # == Define parameters == # s = 0.2 delta = 0.1 a_sigma = 0.4 # A = exp(B) where B ~ N(0, a_sigma) alpha = 0.4 # We set f(k) = k**alpha psi_0 = beta(5, 5, scale=0.5) # Initial distribution phi = lognorm(a_sigma) def p(x, y): """ Stochastic kernel for the growth model with Cobb-Douglas production. Both x and y must be strictly positive. """ d = s * x**alpha return phi.pdf((y - (1 - delta) * x) / d) / d n = 10000 T = 30 # Number of observations at each date t # Compute density of k_t at 1,...,T+1

# == Generate matrix s.t. t-th column is n observations of k_t == # k = np.empty((n, T)) A = phi.rvs((n, T)) k[:, 0] = psi_0.rvs(n) # Draw first column from initial distribution for t in range(T-1): k[:, t+1] = s * A[:,t] * k[:, t]**alpha + (1 - delta) * k[:, t] # == Generate T instances of lae using this data, one for each date t == # laes = [lae(p, k[:, t]) for t in range(T)] # == Plot == # fig, ax = plt.subplots() ygrid = np.linspace(0.01, 4.0, 200) greys = [str(g) for g in np.linspace(0.0, 0.8, T)] greys.reverse() for psi, g in zip(laes, greys): ax.plot(ygrid, psi(ygrid), color=g, lw=2, alpha=0.6) ax.set_xlabel('capital') title = r'Density of $k_1$ (lighter) to $k_T$ (darker) for $T={}$' ax.set_title(title.format(T)) plt.show()

Comments on the coding techniques are given just below When run, the code produces a gure like this The gure shows part of the density sequence {t }, with each density computed via the look ahead estimator Notice that the sequence of densities shown in the gure seems to be converging more on this

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.1. CONTINUOUS STATE MARKOV CHAINS

293

in just a moment Another quick comment is that each of these distributions could be interpreted as a cross sectional distribution (recall this discussion) Comments on the Code Regarding the code itself, we have chosen to implement the look ahead estimator as a class Given our use of the __call__ method, an instance of this class acts as a callable object, which is essentially a function that can store its own data (see this discussion) This function returns the right-hand side of (5.13) using the data and stochastic kernel that it stores as its instance data the value y as its argument The function is vectorized, in the sense that if psi is such an instance and y is an array, then the call psi(y) acts elementwise (This is the reason that we reshaped X and y inside the class to make vectorization work) Because the implementation is fully vectorized, it is about as efcient as it would be in C or Fortran

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.1. CONTINUOUS STATE MARKOV CHAINS

294

The General Case


Up until now, we have focused exclusively on continuous state Markov chains where all conditional distributions p( x, ) are densities As discussed above, not all distributions can be represented as densities If the conditional distribution of Xt+1 given Xt = x cannot be represented as a density for some x S, then we need a slightly different theory The ultimate option is to switch from densities to probability measures, but not all readers will be familiar with measure theory We can, however, construct a fairly general theory using distribution functions Example and Denitions To illustrate the issues, recall that Hopenhayn and Rogerson [HopenhaynRogerson1993] study a model of rm dynamics where individual rm productivity follows the exogenous process X t +1 = a + X t + t +1 , where

{ t } N (0, 2 )

IID

As is, this ts into the density case we treated above However, the authors wanted this process to take values in [0, 1], so they added boundaries at the end points 0 and 1 One way to write this is X t +1 = h ( a + X t + t +1 ) where h ( x ) : = x 1 {0 x 1} + 1 { x > 1}

If you think about it, you will see that for any given x [0, 1], the conditional distribution of Xt+1 given Xt = x puts positive probability mass on 0 and 1 Hence it cannot be represented as a density What we can do instead is use cumulative distribution functions (cdfs) To this end, set G ( x , y ) : = P { h ( a + x + t +1 ) y }

(0 x , y 1)

This family of cdfs G ( x, ) plays a role analogous to the stochastic kernel in the density case The distribution dynamics in (5.9) are then replaced by Ft+1 (y) = G ( x, y) Ft (dx ) (5.14)

Here Ft and Ft+1 are cdfs representing the distribution of the current state and next period state The intuition behind (5.14) is essentially the same as for (5.9)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.1. CONTINUOUS STATE MARKOV CHAINS

295

Computation If you wish to compute these cdfs, you cannot use the look-ahead estimator as before Indeed, you should not use any density estimator, since the objects you are estimating/computing are not densities One good option is simulation as before, combined with the empirical distribution function

Stability
In our lecture on nite Markov chains we also studied stationarity, stability and ergodicity Here we will cover the same topics for the continuous case We will, however, treat only the density case (as in this section), where the stochastic kernel is a family of densities The general case is relatively similar references are given below Theoretical Results Analogous to the nite case, given a stochastic kernel p and corresponding Markov operator as dened in (5.10), a density on S is called stationary for P if it is a xed point of the operator P In other words, (y) = p( x, y) ( x ) dx,

y S

(5.15)

As with the nite case, if is stationary for P, and the distribution of X0 is , then, in view of (5.11), Xt will have this same distribution for all t Hence is the stochastic equivalent of a steady state In the nite case, we learned that at least one stationary distribution exists, although there may be many When the state space is innite, the situation is more complicated Even existence can fail very easily For example, the random walk model has no stationary density (see, e.g., EDTC, p. 210) However, there are well-known conditions under which a stationary density exists With additional conditions, we can also get a unique stationary density ( D and = P = = ), and also global convergence in the sense that

D,

Pt

as

(5.16)

This combination of existence, uniqueness and global convergence in the sense of (5.16) is often referred to as global stability Under very similar conditions, we get ergodicity, which means that 1 n h ( Xt ) n t =1 T HOMAS S ARGENT AND J OHN S TACHURSKI h( x ) ( x )dx as n (5.17) February 5, 2014

5.1. CONTINUOUS STATE MARKOV CHAINS

296

for any (measurable) function h : S R such that the right-hand side is nite Note that the convergence in (5.17) does not depend on the distribution (or value) of X0 This is actually very important for simulation it means we can learn about (i.e., approximate the right hand side of (5.17) via the left hand side) without requiring any special knowledge about what to do with X0 So what are these conditions we require to get global stability and ergodicity? In essence, it must be the case that 1. Probability mass does not drift off to the edges of the state space 2. Sufcient mixing obtains For one such set of conditions see theorem 8.2.14 of EDTC In addition [StokeyLucas1989] contains a classic (but slightly outdated) treatment of these topics From the mathematical literature, [LasotaMackey1994] and [MeynTweedie2009] give outstanding in depth treatments Section 8.1.2 of EDTC provides detailed intuition, and section 8.3 gives additional references EDTC, section 11.3.4 provides a specic treatment for the growth model we considered in this lecture An Example of Stability As stated above, the growth model treated here is stable under mild conditions on the primitives see EDTC, section 11.3.4 We can see this stability in action in particular, the convergence in (5.16) by simulating the path of densities from various initial conditions Here is such a gure All sequences are converging towards the same limit, regardless of their initial condition The details regarding initial conditions and so on are given in this exercise, where you are asked to replicate the gure Computing Stationary Densities In the preceding gure, each sequence of densities is converging towards the unique stationary density Even from this gure we can get a fair idea what looks like, and where its mass is located However, there is a much more direct way to estimate the stationary density, and it involves only a slight modication of the look ahead estimator Lets say that we have a model of the form (5.3) that is stable and ergodic Let p be the corresponding stochastic kernel, as given in (5.7)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.1. CONTINUOUS STATE MARKOV CHAINS

297

To approximate the stationary density , we can simply generate a long time series X0 , X1 , . . . , Xn and estimate via 1 n n ( y ) = p ( Xt , y ) (5.18) n t =1 This is essentially the same as the look ahead estimator (5.13), except that now the observations we generate are a single time series, rather than a cross section The justication for (5.18) is that, with probability one as n , 1 n p ( Xt , y ) n t =1 p( x, y) ( x ) dx = (y)

where the convergence is by (5.17) and the equality on the right is by (5.15) The right hand side is exactly what we want to compute On top of this asymptotic result, it turns out that the rate of convergence for the look ahead estimator is very good The rst exercise helps illustrate this point

Exercises
Exercise 1 Consider the simple threshold autoregressive model Xt+1 = | Xt | + (1 2 )1/2 t+1 where

{ t } N (0, 1)

IID

This is one of those rare nonlinear stochastic models where an analytical expression for the stationary density is available T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.1. CONTINUOUS STATE MARKOV CHAINS

298

In particular, provided that | | < 1, there is a unique stationary density given by ( y ) = 2 ( y ) y (1 2 )1/2 (5.19)

Here is the standard normal density and is the standard normal cdf As an exercise, compute the look ahead estimate of , as dened in (5.18), and compare it with in (5.19) to see whether they are indeed close for large n In doing so, set = 0.8 and n = 500 The next gure shows the result of such a computation

The additional density (black line) is a nonparametric kernel density estimate, added to the solution for illustration (You can try to replicate it before looking at the solution if you want to) As you can see, the look ahead estimator is a much tighter t than the kernel density estimator If you repeat the simulation you will see that this is consistently the case Solution: View solution Exercise 2 Replicate the gure on global convergence shown above For the four initial distributions, use the shifted beta distributions
psi_0 = beta(5, 5, scale=0.5, loc=i*2)

for i in range(4) Solution: View solution T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.2. MODELING CAREER CHOICE

299

Appendix
Heres the proof of (5.6) Let FU and FV be the cumulative distributions of U and V respectively By the denition of V , we have FV (v) = P{ a + bU v} = P{U (v a)/b} In other words, FV (v) = FU ((v a)/b) Differentiating with respect to v yields (5.6)

5.2 Modeling Career Choice


Overview
Next we study a computational problem concerning career and job choices. The model is originally due to Derek Neal [Neal1999] and this exposition draws on the presentation in RMT3, section 6.5. Model features career and job within career both chosen to maximize expected discounted wage ow innite horizon dynamic programming with two states variables

Model
In what follows we distinguish between a career and a job, where a career is understood to be a general eld encompassing many possible jobs, and a job is understood to be a position with a particular rm For workers, wages can be decomposed into the contribution of job and career wt = t + t , where t is contribution of career at time t
t

is contribution of job at time t

At the start of time t, the worker has the following options retain their current (career, job) pair (t , t ) referred to hereafter as stay put retain their current career t but redraw their job redraw both their career t and their job
t t

referred to hereafter as new job

referred to hereafter as new life

Draws of and are independent of each other and past values, with t F T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.2. MODELING CAREER CHOICE

300

Notice that the worker does not have the option to retain their job but redraw their career starting a new career always requires starting a new job A young worker aims to maximize the expected sum of discounted wages E t wt
t =0

(5.20)

subject to the choice restrictions specied above Let V ( , ) denote the value function, which is the maximum of (5.20) over all feasible (career, job) policies, given the initial state ( , ) The value function obeys V ( , ) = max{ I , I I , I I I }, where I = + + V ( , ) II = + III = G (d ) + F (d ) + V ( , ) G (d ) V ( , ) G (d ) F (d ) (5.21)

G (d ) +

Evidently I , I I and I I I correspond to stay put, new job and new life respectively Parameterization As in RMT3, section 6.5, we will focus on a discrete version of the model, parameterized as follows: both and take values in the set np.linspace(0, B, N) an even grid of N points between 0 and B inclusive N = 50 B=5 = 0.95 The distributions F and G are discrete distributions generating draws from the grid points np.linspace(0, B, N) A very useful family of discrete distributions is the Beta-binomial family, with probability mass function n B ( k + a, n k + b ) p ( k | n, a, b ) = , k = 0, . . . , n k B ( a, b ) Interpretation: draw q from a Beta distribution with shape parameters ( a, b) run n independent binary trials, each with success probability q p(k | n, a, b) is the probability of k successes in these n trials T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.2. MODELING CAREER CHOICE

301

Nice properties: very exible class of distributions, including uniform, symmetric unimodal, etc. only three parameters Heres a gure showing the effect of different shape parameters when n = 50

The code that generated this gure is as follows


""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: beta-binomial.py Authors: John Stachurski, Thomas J. Sargent LastModified: 11/08/2013 """ from scipy.special import binom, beta import matplotlib.pyplot as plt import numpy as np def gen_probs(n, a, b): probs = np.zeros(n+1) for k in range(n+1): probs[k] = binom(n, k) * beta(k + a, n - k + b) / beta(a, b) return probs n = 50 a_vals = [0.5, 1, 100] b_vals = [0.5, 1, 100] fig, ax = plt.subplots() for a, b in zip(a_vals, b_vals): ab_label = r'$a = %.1f $, $b = %.1f $' % (a, b) ax.plot(range(0, n+1), gen_probs(n, a, b), '-o', label=ab_label) ax.legend() plt.show()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.2. MODELING CAREER CHOICE

302

Implementation: career.py
This section describes the module career, which solves the DP problem described above, and is provided in the main repository The main aim of the module is to implement iteration with the Bellman operator T In this model, T is dened by Tv( , ) = max{ I , I I , I I I }, where I , I I and I I I are as given in (5.21), replacing V with v The module career denes a function gen_probs() that computes Beta-binomial probabilities, as above a class workerProblem that encapsulates all the details of a particular parameterization a function bellman() that corresponds to the Bellman operator a function get_greedy() that computes policies from value functions The code is as follows
import numpy as np from scipy.special import binom, beta def gen_probs(n, a, b): """ Generate and return the vector of probabilities for the Beta-binomial (n, a, b) distribution. """ probs = np.zeros(n+1) for k in range(n+1): probs[k] = binom(n, k) * beta(k + a, n - k + b) / beta(a, b) return probs class workerProblem: def __init__(self, B=5.0, beta=0.95, N=50, F_a=1, F_b=1, G_a=1, G_b=1): self.beta, self.N, self.B = beta, N, B self.theta = np.linspace(0, B, N) # set of theta values self.epsilon = np.linspace(0, B, N) # set of epsilon values self.F_probs = gen_probs(N-1, F_a, F_b) self.G_probs = gen_probs(N-1, G_a, G_b) self.F_mean = np.sum(self.theta * self.F_probs) self.G_mean = np.sum(self.epsilon * self.G_probs) def bellman(w, v): """ The Bellman operator. * w is an instance of workerProblem * v is a 2D NumPy array representing the value function

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.2. MODELING CAREER CHOICE

303

The array v should be interpreted as v[i, j] = v(theta_i, epsilon_j). Returns the updated value function Tv as an array of shape v.shape """ new_v = np.empty(v.shape) for i in range(w.N): for j in range(w.N): v1 = w.theta[i] + w.epsilon[j] + w.beta * v[i, j] v2 = w.theta[i] + w.G_mean + w.beta * np.dot(v[i, :], w.G_probs) v3 = w.G_mean + w.F_mean + w.beta * \ np.dot(w.F_probs, np.dot(v, w.G_probs)) new_v[i, j] = max(v1, v2, v3) return new_v def get_greedy(w, v): """ Compute optimal actions taking v as the value function. Parameters are the same as for bellman(). Returns a 2D NumPy array "policy", where policy[i, j] is the optimal action at state (theta_i, epsilon_j). The optimal action is represented as an integer in the set 1, 2, 3, where 1 = 'stay put', 2 = 'new job' and 3 = 'new life' """ policy = np.empty(v.shape, dtype=int) for i in range(w.N): for j in range(w.N): v1 = w.theta[i] + w.epsilon[j] + w.beta * v[i, j] v2 = w.theta[i] + w.G_mean + w.beta * np.dot(v[i, :], w.G_probs) v3 = w.G_mean + w.F_mean + w.beta * \ np.dot(w.F_probs, np.dot(v, w.G_probs)) if v1 > max(v2, v3): action = 1 elif v2 > max(v1, v3): action = 2 else: action = 3 policy[i, j] = action return policy

The default probability distributions in workerProblem correspond to discrete uniform distributions (see the Beta-binomial gure) In fact all our default settings correspond to the version studied in RMT3, section 6.5. Hence we can reproduce gures 6.5.1 and 6.5.2 shown there, which exhibit the value function and optimal policy respectively Heres the value function The code used to produce this plot was
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: career_vf_plot.py Authors: John Stachurski and Thomas Sargent LastModified: 11/08/2013

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.2. MODELING CAREER CHOICE

304

Figure 5.1: Value function with uniform probabilities

""" import matplotlib.pyplot as plt from mpl_toolkits.mplot3d.axes3d import Axes3D from matplotlib import cm from career import * from compute_fp import compute_fixed_point # === solve for the value function === # wp = workerProblem() v_init = np.ones((wp.N, wp.N))*100 v = compute_fixed_point(bellman, wp, v_init) # === plot value function === # fig = plt.figure(figsize=(8,6)) ax = fig.add_subplot(111, projection='3d') tg, eg = np.meshgrid(wp.theta, wp.epsilon) ax.plot_surface(tg, eg, v.T, rstride=2, cstride=2, cmap=cm.jet, alpha=0.5, linewidth=0.25) ax.set_zlim(150, 200) ax.set_xlabel('theta', fontsize=14) ax.set_ylabel('epsilon', fontsize=14) plt.show()

The code pulls in the convenience function compute_fixed_point() from the module compute_fp, which can be found in the main repository The optimal policy can be represented as follows (see Exercise 3 for code)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.2. MODELING CAREER CHOICE

305

Interpretation: If both job and career are poor or mediocre, the worker will experiment with new job and new career If career is sufciently good, the worker will hold it and experiment with new jobs until a sufciently good one is found If both job and career are good, the worker will stay put Notice that the worker will always holds on to a sufciently good career, but not necessarily hold on to even the best paying job The reason is that high lifetime wages require both variables to be large, and the worker cannot change careers without changing jobs Sometimes a good job must be sacriced in order to change to a better career

Exercises
Exercise 1 Using the default parameterization in the class workerProblem, generate and plot typical sample paths for and when the worker follows the optimal policy In particular, modulo randomness, reproduce the following gure (where the horizontal axis represents time)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.2. MODELING CAREER CHOICE

306

Hint: To generate the draws from the distributions F and G, use the module discrete_rv, which can be found in the main repository Solution: View solution Exercise 2 Lets now consider how long it takes for the worker to settle down to a permanent job, given a starting point of ( , ) = (0, 0) In other words, we want to study the distribution of the random variable T := the rst point in time from which the workers job no longer changes Evidently, the workers job becomes permanent if and only if (t , t ) enters the stay put region of ( , ) space Letting S denote this region, T can be expressed as the rst passage time to S under the optimal policy: T := inf{t 0 | (t , t ) S} Collect 25,000 draws of this random variable and compute the median (which should be about 7) Repeat the exercise with = 0.99 and interpret the change Solution: View solution Exercise 3 As best you can, reproduce the gure showing the optimal policy Hint: The get_greedy() function returns a representation of the optimal policy where values 1, 2 and 3 correspond to stay put, new job and new life respectively. Use this and contourf from matplotlib.pyplot to produce the different shadings.

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.3. ON-THE-JOB SEARCH

307

Now set G_a = G_b = 100 and generate a new gure with these parameters. Interpret. Solution: View solution

5.3 On-the-Job Search


Overview
In this section we solve a simple on-the-job search model based on RMT3, exercise 6.18 see also [add Jovanovic reference] Model features job-specic human capital accumulation combined with on-the-job search innite horizon dynamic programming with one state variable and two controls

Model
Let xt denote the time-t job-specic human capital of a worker employed at a given rm wt denote current wages Let wt = xt (1 st t ), where t is investment in job-specic human capital for the current role st is search effort, devoted to obtaining new offers from other rms. For as long as the worker remains in the current job, evolution of { xt } is given by xt+1 = G ( xt , t ) When search effort at t is st , the worker receives a new job offer with probability (st ) [0, 1] Value of offer is Ut+1 , where {Ut } is iid with common distribution F Worker has the right to reject the current offer and continue with existing job. In particular, xt+1 = Ut+1 if accepts and xt+1 = G ( xt , t ) if rejects Letting bt+1 {0, 1} be binary with bt+1 = 1 indicating an offer, we can write xt+1 = (1 bt+1 ) G ( xt , t ) + bt+1 max{ G ( xt , t ), Ut+1 } Agents objective: maximize expected discounted sum of wages via controls {st } and {t } Taking the expectation of V ( xt+1 ) and using (5.22), the Bellman equation for this problem can be written as V ( x ) = max
s + 1

(5.22)

x (1 s ) + (1 (s))V [ G ( x, )] + (s)

V [ G ( x, ) u] F (du) .

(5.23)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.3. ON-THE-JOB SEARCH

308

Here nonnegativity of s and is understood, while a b := max{ a, b} Parameterization In the implementation below, we will focus on the parameterization G ( x, ) = A( x ) , (s) = s and F = Beta(2, 2) with default parameter values A = 1.4 = 0.6 = 0.96 The Beta(2,2) distribution is supported on (0, 1). It has a unimodal, symmetric density peaked at 0.5. Back-of-the-Envelope Calculations Before we solve the model, lets make some quick calculations that provide intuition on what the solution should look like. To begin, observe that the worker has two instruments to build capital and hence wages: 1. invest in capital specic to the current job via 2. search for a new job with better job-specic capital match via s Since wages are x (1 s ), marginal cost of investment via either or s is identical Our risk neutral worker should focus on whatever instrument has the highest expected return The relative expected return will depend on x For example, suppose rst that x = 0.05 If s = 1 and = 0, then since G ( x, ) = 0, taking expectations of (5.22) gives expected next period capital equal to (s)EU = EU = 0.5 If s = 0 and = 1, then next period capital is G ( x, ) = G (0.05, 1) 0.23 Both rates of return are good, but the return from search is better Next suppose that x = 0.4 If s = 1 and = 0, then expected next period capital is again 0.5 If s = 0 and = 1, then G ( x, ) = G (0.4, 1) 0.8 Regturn from investment via dominates expected return from search Combining these observations give us two informal predictions: 1. At any given state x, the two controls and s will function primarily as substitutes worker will focus on whichever instrument has the higher expected return 2. For sufciently small x, search will be preferable to investment in job-specic human capital. For larger x, the reverse will be true Now lets turn to implementation, and see if we can match our predictions. T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.3. ON-THE-JOB SEARCH

309

Implementation: jv.py
This section describes the module jv, which solves the DP problem described above As with all other code, the le can be found in the main repository Code The code listing is followed by a detailed explanation skim the code and move on to the explanation before reading the code in detail
import numpy as np from scipy.integrate import fixed_quad as integrate from scipy.optimize import fmin_slsqp as minimize import scipy.stats as stats from scipy import interp epsilon = 1e-4 # A small number, used in the optimization routine

class workerProblem: def __init__(self, A=1.4, alpha=0.6, beta=0.96, grid_size=50): """ This class is just a "struct" to hold the attributes of a given model. """ self.A, self.alpha, self.beta = A, alpha, beta # === set defaults for G, pi and F === # self.G = lambda x, phi: A * (x * phi)**alpha self.pi = np.sqrt self.F = stats.beta(2, 2) # === Set up grid over the state space for DP === # # Max of grid is the max of a large quantile value for F and the # fixed point y = G(y, 1). grid_max = max(A**(1 / (1 - alpha)), self.F.ppf(1 - epsilon)) self.x_grid = np.linspace(epsilon, grid_max, grid_size) def bellman_operator(wp, V, brute_force=False, return_policies=False): """ Parameter wp is an instance of workerProblem. Thus function returns the approximate value function TV by applying the Bellman operator associated with the model wp to the function V. Returns TV, or the V-greedy policies s_policy and phi_policy when return_policies=True. In the function, the array V is replaced below with a function Vf that implements linear interpolation over the points (V(x), x) for x in x_grid. If the brute_force flag is true, then grid search is performed at each maximization step. In either case, T returns a NumPy array representing the updated values TV(x) over x in x_grid. """ # === simplify names, set up arrays, etc. === # G, pi, F, beta = wp.G, wp.pi, wp.F, wp.beta Vf = lambda x: interp(x, wp.x_grid, V) N = len(wp.x_grid)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.3. ON-THE-JOB SEARCH

310

new_V, s_policy, phi_policy = np.empty(N), np.empty(N), np.empty(N) a, b = F.ppf(0.005), F.ppf(0.995) # Quantiles, for integration c1 = lambda z: 1 - sum(z) # used to enforce s + phi <= 1 c2 = lambda z: z[0] - epsilon # used to enforce s >= epsilon c3 = lambda z: z[1] - epsilon # used to enforce phi >= epsilon guess, constraints = (0.2, 0.2), [c1, c2, c3] # === solve r.h.s. of Bellman equation === # for i, x in enumerate(wp.x_grid): # === set up objective function === # def w(z): s, phi = z integrand = lambda u: Vf(np.maximum(G(x, phi), u)) * F.pdf(u) integral, err = integrate(integrand, a, b) q = pi(s) * integral + (1 - pi(s)) * Vf(G(x, phi)) return - x * (1 - phi - s) - beta * q # minus because we minimize # === either use SciPy solver === # if not brute_force: max_s, max_phi = minimize(w, guess, ieqcons=constraints, disp=0) max_val = -w((max_s, max_phi)) # === or search on a grid === # else: search_grid = np.linspace(epsilon, 1, 15) max_val = -1 for s in search_grid: for phi in search_grid: current_val = -w((s, phi)) if s + phi <= 1 else -1 if current_val > max_val: max_val, max_s, max_phi = current_val, s, phi # === store results === # new_V[i] = max_val s_policy[i], phi_policy[i] = max_s, max_phi if return_policies: return s_policy, phi_policy else: return new_V

Explanation The le jv.py begins with a few quick imports fixed_quad is a simple non-adaptive integration routine fmin_slsqp is a minimization routine that permits inequality constraints Next we build a simple class called workerProblem that packages all the parameters and other basic attributes of a given model Whats the point of dening such a class?

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.3. ON-THE-JOB SEARCH

311

The point is that when we come to writing the code for the Bellman operator, we want to make it relatively genericand hence reusable. For example, use generic G ( x, ) instead of specic A( x ) . So any specics need to be passed in to the Bellman operator when we call it. To avoid a long list of parameters, it is convenient to wrap all the specics in a single class, an instance of which can be passed to the operatorhence the class workerProblem Regarding the function bellman_operator(), it takes as arguments an instance of workerProblem, which contains all the specics of a particular parameterization a candidate value function V to be updated to TV via TV ( x ) = min w(s, )
s + 1

where w(s, ) := x (1 s ) + (1 (s))V [ G ( x, )] + (s) V [ G ( x, ) u] F (du) (5.24)

Here we are minimizing instead of maximizing to t with SciPys optimization routines When we represent V , it will be with a NumPy array V giving values on grid x_grid But to evaluate the right-hand side of (5.24), we need a function, so we replace the arrays V and x_grid with a function Vf that gives linear iterpolation of V on x_grid In the preliminaries of the function bellman_operator() from the array V we dene a linear interpolation Vf of its values c1 is used to implement the constraint s + 1 c2 is used to implement s , a numerically stable alternative to the true constraint s 0 c3 does the same for Inside the for loop, for each x in the grid over the state space, we set up the function w(z) = w(s, ) dened in (5.24). The function is minimized over all feasible (s, ) pairs, either by a relatively sophisticated solver from SciPy called fmin_slsqp, or brute force search over a grid The former is much faster, but convergence to the global optimum is not guaranteed. Grid search is a simple way to check results

Solving for Policies


Lets plot the optimal policies and see what they look like

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.3. ON-THE-JOB SEARCH

312

The code is in a le jv_test.py that imports jv (available from the main repository) and looks as follows
from jv import workerProblem, bellman_operator from compute_fp import compute_fixed_point # === solve for optimal policy === # wp = workerProblem(grid_size=25) v_init = wp.x_grid * 0.5 V = compute_fixed_point(bellman_operator, wp, v_init, max_iter=40) s_policy, phi_policy = bellman_operator(wp, V, return_policies=True) # === plot policies === # fig, ax = plt.subplots() ax.set_xlim(0, max(wp.x_grid)) ax.set_ylim(-0.1, 1.1) ax.plot(wp.x_grid, phi_policy, 'b-', label='phi') ax.plot(wp.x_grid, s_policy, 'g-', label='s') ax.legend() plt.show()

It produces the following gure

Figure 5.2: Optimal policies The horizontal axis is the state x, while the vertical axis gives s( x ) and ( x ) Overall, the policies match well with our predictions from section Back-of-the-Envelope Calculations. Worker switches from one investment strategy to the other depending on relative return For low values of x, the best option is to search for a new job Once x is larger, worker does better by investing in human capital specic to the current position

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.3. ON-THE-JOB SEARCH

313

Exercises
Exercise 1 Lets look at the dynamics for the state process { xt } associated with these policies. The dynamics are given by (5.22) when t and st are chosen according to the optimal policies, and P { bt + 1 = 1 } = ( s t ) . Since the dynamics are random, analysis is a bit subtle One way to do it is to plot, for each x in a relatively ne grid called plot_grid, a large number K of realizations of xt+1 given xt = x. Plot this with one dot for each realization, in the form of a 45 degree diagram. Set:
K = 50 plot_grid_max, plot_grid_size = 1.2, 100 plot_grid = np.linspace(0, plot_grid_max, plot_grid_size) fig, ax = plt.subplots() ax.set_xlim(0, plot_grid_max) ax.set_ylim(0, plot_grid_max)

By examining the plot, argue that under the optimal policies, the state xt will converge to a con close to unity stant value x Argue that at the steady state, st 0 and t 0.6. Solution: View solution Exercise 2 In the preceding exercise we found that st converges to zero and t converges to about 0.6 Since these results were calculated at a value of close to one, lets compare them to the best choice for an innitely patient worker. Intuitively, an innitely patient worker would like to maximize steady state wages, which are a function of steady state capital. You can take it as givenits certainly truethat the innitely patient worker does not search in the long run (i.e., st = 0 for large t) Thus, given , steady state capital is the positive xed point x () of the map x G ( x, ). Steady state wages can be written as w () = x ()(1 ) Graph w () with respect to , and examine the best choice of Can you give a rough intepretation for the value that you see? Solution: View solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.4. SEARCH WITH OFFER DISTRIBUTION UNKNOWN

314

5.4 Search with Oer Distribution Unknown


Overview
In this lecture we consider an extension of the job search model developed by John J. McCall [McCall1970] In the McCall model, an unemployed worker decides when to accept a permanent position at a specied wage, given his or her discount rate the level of unemployment compensation the distribution from which wage offers are drawn In the version considered below, the wage distribution is unknown and must be learned Based on the presentation in RMT3, section 6.6 Model features Innite horizon dynamic programming with two states and one binary control Bayesian updating to learn the unknown distribution

Model
Lets rst recall the basic McCall model [McCall1970] and then add the variation we want to consider The Basic McCall Model Consider an unemployed worker who is presented in each period with a permanent job offer at wage wt At time t, our worker has two choices 1. Accept the offer and work permanently at constant wage wt 2. Reject the offer, receive unemployment compensation c, and reconsider next period The wage sequence {wt } is iid and generated from known density h
t The worker aims to maximize the expected discounted sum of earnings E t =0 y t

Trade-off: Waiting too long for a good offer is costly, since the future is discounted Accepting too early is costly, since better offers will arrive with probability one Let V (w) denote the maximal expected discounted sum of earnings that can be obtained by an unemployed worker who starts with wage offer w in hand

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.4. SEARCH WITH OFFER DISTRIBUTION UNKNOWN

315

The function V satises the recursion V (w) = max w , c+ 1 V (w )h(w )dw (5.25)

where the two terms on the r.h.s. are the respective payoffs from accepting and rejecting the current offer w The optimal policy is a map from states into actions, and hence a binary function of w }, where Not surprisingly, it turns out to have the form 1{w w is a constant depending on ( , h, c) called the reservation wage w } is an indicator function returning 1 if w w and 0 otherwise 1{w w 1 indicates accept and 0 indicates reject For further details see RMT3, section 6.3 Offer Distribution Unknown Now lets extend the model by considering the variation presented in RMT3, section 6.6 The model is as above, apart from the fact that the density h is unknown the worker learns about h by starting with a prior and updating based on wage offers that he/she observes The worker knows there are two possible distributions F and G with densities f and g At the start of time, nature selects h to be either f or g the wage distribution from which the entire sequence {wt } will be drawn This choice is not observed by the worker, who puts prior probability 0 on f being chosen Update rule: workers time t estimate of the distribution is t f + (1 t ) g, where t updates via t +1 = t f ( w t +1 ) t f ( w t +1 ) + (1 t ) g ( w t +1 ) (5.26)

This last expression follows from Bayes rule, which tells us that P{ h = f | W = w } = P {W = w | h = f } P { h = f } P {W = w } and P{W = w} =

{ f , g}

P {W = w | h = } P { h = }

The fact that (5.26) is recursive allows us to progress to a recursive solution method Letting f (w) f ( w ) + (1 ) g ( w ) we can express the value function for the unemployed worker recursively as follows h ( w ) : = f ( w ) + (1 ) g ( w ) and q ( w, ) : = V (w, ) = max w , c+ 1 V (w , ) h (w ) dw where = q(w , ) (5.27)

Notice that the current guess is a state variable, since it affects the workers perception of probabilities for future rewards T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.4. SEARCH WITH OFFER DISTRIBUTION UNKNOWN

316

Parameterization Following section 6.6 of RMT3, our baseline parameterization will be f = Beta(1, 1) and g = Beta(3, 1.2) = 0.95 and c = 0.6 The densities f and g have the following shape

Looking Forward What kind of optimal policy might result from (5.27) and the parameterization specied above? Intuitively, if we accept at wa and wa wb , then all other things being given we should also accept at wb This suggests a policy of accepting whenever w exceeds some threshold value w should depend on in fact it should be decreasing in because But w f is a less attractive offer distribution than g larger means more weight on f and less on g Thus larger depresses the workers assessment of her future prospects, and relatively low current offers become more attractive ( )} for some deSummary: We conjecture that that the optimal policy is of the form 1{w w creasing function w

Take 1: Solution by VFI


Lets set about solving the model and see how our results match with our intuition T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.4. SEARCH WITH OFFER DISTRIBUTION UNKNOWN

317

We begin by solving via value function iteration (VFI), which is natural but ultimately turns out to be second best VFI is implemented in the module odu_vfi, provided in the main repository The code is as follows but read discussion given beneath it rst
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: odu_vfi.py Authors: John Stachurski and Thomas Sargent LastModified: 11/08/2013 Solves the "Offer Distribution Unknown" Model by value function iteration. Note that a much better technique is given in solution_odu_ex1.py """ from scipy.interpolate import LinearNDInterpolator from scipy.integrate import fixed_quad from scipy.stats import beta as beta_distribution import numpy as np class searchProblem: """ A class to store a given parameterization of the "offer distribution unknown" model. """ def __init__(self, beta=0.95, c=0.6, F_a=1, F_b=1, G_a=3, G_b=1.2, w_max=2, w_grid_size=40, pi_grid_size=40): """ Sets up parameters and grid. The attribute "grid_points" defined below is a 2 column array that stores the 2D grid points for the DP problem. Each row represents a single (w, pi) pair. """ self.beta, self.c, self.w_max = beta, c, w_max self.F = beta_distribution(F_a, F_b, scale=w_max) self.G = beta_distribution(G_a, G_b, scale=w_max) self.f, self.g = self.F.pdf, self.G.pdf # Density functions self.pi_min, self.pi_max = 1e-3, 1 - 1e-3 # Avoids instability self.w_grid = np.linspace(0, w_max, w_grid_size) self.pi_grid = np.linspace(self.pi_min, self.pi_max, pi_grid_size) x, y = np.meshgrid(self.w_grid, self.pi_grid) self.grid_points = np.column_stack((x.ravel(1), y.ravel(1))) def q(self, w, pi): """ Updates pi using Bayes' rule and the current wage observation w. """ new_pi = 1.0 / (1 + ((1 - pi) * self.g(w)) / (pi * self.f(w))) # Return new_pi when in [pi_min, pi_max], and the end points otherwise return np.maximum(np.minimum(new_pi, self.pi_max), self.pi_min)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.4. SEARCH WITH OFFER DISTRIBUTION UNKNOWN

318

def bellman(sp, v): """ The Bellman operator. * sp is an instance of searchProblem * v is an approximate value function represented as a one-dimensional array.

""" f, g, beta, c, q = sp.f, sp.g, sp.beta, sp.c, sp.q # Simplify names vf = LinearNDInterpolator(sp.grid_points, v) N = len(v) new_v = np.empty(N) for i in range(N): w, pi = sp.grid_points[i,:] v1 = w / (1 - beta) integrand = lambda m: vf(m, q(m, pi)) * (pi * f(m) + (1 - pi) * g(m)) integral, error = fixed_quad(integrand, 0, sp.w_max) v2 = c + beta * integral new_v[i] = max(v1, v2) return new_v def get_greedy(sp, v): """ Compute optimal actions taking v as the value function. Parameters are the same as for bellman(). Returns a NumPy array called "policy", where policy[i] is the optimal action at sp.grid_points[i,:]. The optimal action is represented in binary, where 0 indicates reject and 1 indicates accept. """ f, g, beta, c, q = sp.f, sp.g, sp.beta, sp.c, sp.q # Simplify names vf = LinearNDInterpolator(sp.grid_points, v) N = len(v) policy = np.zeros(N, dtype=int) for i in range(N): w, pi = sp.grid_points[i,:] v1 = w / (1 - beta) integrand = lambda m: vf(m, q(m, pi)) * (pi * f(m) + (1 - pi) * g(m)) integral, error = fixed_quad(integrand, 0, sp.w_max) v2 = c + beta * integral policy[i] = v1 > v2 # Evaluates to 1 or 0 return policy

The module begins by dening a class searchProblem, an instance of which stores the parameters and attributes needed to compute optimal actions The Bellman operator is implemented as bellman(), and get_greedy() computes an approximate optimal policy from a guess v of the value function We will omit a detailed discussion of the code because you are about to construct a more efcient solution method Before that lets look quickly at solutions computed from this code Value function: T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.4. SEARCH WITH OFFER DISTRIBUTION UNKNOWN

319

The optimal policy: Code for producing these gures can be found in le odu_vfi_plots.py from the main repository The results t well with our intuition from section Looking Forward ( ) introduced there black line in the gure above corresponds to the function w decreasing as expected

Take 2: A More Ecient Method


Our implementation of VFI can be optimized to some degree, But instead of pursuing that, lets consider another method to solve for the optimal policy Uses iteration with an operator having the same contraction rate as the Bellman operator, but one dimensional rather than two dimensional no maximization step As a consequence, the algorithm is orders of magnitude faster than VFI ( ), the worker is indifferent Another Functional Equation To begin, note that when w = w between accepting and rejecting Hence the two choices on the right-hand side of (5.27) have equal value: ( ) w = c+ 1 V (w , ) h (w ) dw (5.28)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.4. SEARCH WITH OFFER DISTRIBUTION UNKNOWN

320

Together, (5.27) and (5.28) give V (w, ) = max Combining (5.28) and (5.29), we obtain ( ) w = c+ 1 max ( ) w w , 1 1 h (w ) dw ( ) w w , 1 1 (5.29)

Multiplying by 1 , substituting in = q(w , ) and using for composition of functions yields ( ) = (1 ) c + w q(w , ) h (w ) dw max w , w (5.30)

is the unknown function Equation (5.30) can be understood as a functional equation, where w Lets call it the reservation wage functional equation (RWFE) to the RWFE is the object that we wish to compute The solution w Solving the RWFE To solve the RWFE, we will rst show that its solution is the xed point of a contraction mapping To this end, let b[0, 1] be the bounded real-valued functions on [0, 1] := supx[0,1] |( x )|

Consider the operator Q mapping b[0, 1] into Q b[0, 1] via

( Q)( ) = (1 )c +
T HOMAS S ARGENT AND J OHN S TACHURSKI

max w , q(w , ) h (w ) dw

(5.31) February 5, 2014

5.4. SEARCH WITH OFFER DISTRIBUTION UNKNOWN

321

Comparing (5.30) and (5.31), we see that the set of xed points of Q exactly coincides with the set of solutions to the RWFE =w then w that solves (5.30) and vice versa If Qw Moreover, for any , b[0, 1], basic algebra and the triangle inequality for integrals tells us that

|( Q)( ) ( Q)( )|

max w , q(w , ) max w , q(w , )

h (w ) dw (5.32)

Working case by case, it is easy to check that for real numbers a, b, c we always have

| max{ a, b} max{ a, c}| |b c|


Combining (5.32) and (5.33) yields

(5.33)

|( Q)( ) ( Q)( )|

q(w , ) q(w , ) h (w ) dw

(5.34)

Taking the supremum over now gives us Q Q In other words, Q is a contraction of modulus on the complete metric space (b[0, 1], ) Hence to the RWFE exists in b[0, 1] A unique solution w uniformly as k , for any b[0, 1] Qk w The following exercise asks you to exploit these facts to compute an approximation to w (5.35)

Exercises
uniformly as k Exercise 1 For arbitrary initial condition b[0, 1], we know that Qk w and plot it Use this fact to compute an approximation to w Hints: Start by implementing Q as a function It might be helpful to model this function loosely on the function bellman() from odu_vfi.py see above The function compute_fixed_point() from the module compute_fp is convenient for computing xed points and can be found in the main repository Assuming you adopt the default parameters, your result should coincide closely with the gure for the optimal policy shown above Try experimenting with different parameters, and conrm that the change in the optimal policy coincides with your intuition Solution: View solution T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.5. OPTIMAL SAVINGS

322

5.5 Optimal Savings


Overview
Next we study the standard optimal savings problem for an innitely lived consumerthe common ancestor described in RMT3, section 1.3 Also known as the income uctuation problem An important sub-problem for many representative macroeconomic models [Aiyagari1994] [Huggett1993] etc. Useful references include [Deaton1991], [DenHaan2010], [Kuhn2013], [Rabault2002], [Reiter2008] and [SchechtmanEscudero1977] Our presentation of the model will be relatively brief For further details on economic intuition, implication and models, see RMT3 Proofs of all mathematical results stated below can be found in this paper In this lecture we will explore an alternative to value function iteration (VFI) called policy function iteration (PFI) Based on the Euler equation, and not to be confused with Howards policy iteration algorithm Globally convergent under mild assumptions, even when utility is unbounded (both above and below) Numerically, turns out to be faster and more efcient than VFI for this model Model features Innite horizon dynamic programming with two states and one control

The Optimal Savings Problem


Consider a household that chooses a state-contingent consumption plan {ct }t0 to maximize E subject to ct + at+1 Rat + zt , Here (0, 1) is the discount factor T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014 ct 0, at b t = 0, 1, . . . (5.36)

t =0

t u(ct )

5.5. OPTIMAL SAVINGS

323

at is asset holdings at time t, with ad-hoc borrowing constraint at b ct is consumption zt is non-capital income (wages, unemployment compensation, etc.) R := 1 + r, where r > 0 is the interest rate on savings Assumptions 1. {zt } is a nite Markov process with Markov matrix taking values in Z 2. | Z | < and Z (0, ) 3. r > 0 and R < 1 4. u is smooth, strictly increasing and strictly concave with limc0 u (c) limc u (c) = 0 The asset space is [b, ) and the state is the pair ( a, z) S := [b, ) Z A feasible consumption path from ( a, z) S is a consumption sequence {ct } such that {ct } and its induced asset path { at } satisfy 1. ( a0 , z0 ) = ( a, z) 2. the feasibility constraints in (5.36), and 3. measurability of ct w.r.t. the ltration generated by {z1 , . . . , zt } The meaning of the third point is just that consumption at time t can only be a function of outcomes that have already been observed The value function V : S R is dened by V ( a, z) := sup E and

t =0

t u(ct )

(5.37)

where the supremum is over all feasible consumption paths from ( a, z). An optimal consumption path from ( a, z) is a feasible consumption path from ( a, z) that attains the supremum in (5.37) Given our assumptions, it is known that 1. For each ( a, z) S, a unique optimal consumption path from ( a, z) exists 2. This path is the unique feasible path from ( a, z) satisfying the Euler equality u (ct ) = max R Et [u (ct+1 )] , u ( Rat + zt + b) and the transversality condition
t

(5.38)

lim t E [u (ct ) at+1 ] = 0.

(5.39)

Moreover, there exists an optimal consumption function c : S [0, ) such that the path from ( a, z) generated by

( a0 , z0 ) = ( a, z ),

zt+1 (zt , dy),

ct = c ( at , zt )

and

at+1 = Rat + zt ct February 5, 2014

T HOMAS S ARGENT AND J OHN S TACHURSKI

5.5. OPTIMAL SAVINGS

324

satised both (5.38) and (5.39), and hence is the unique optimal path from ( a, z) In summary, to solve the optimization problem, we need to compute c

Computation
There are two standard ways to solve for c 1. Value function iteration (VFI) 2. Policy function iteration (PFI) using the Euler inequality Policy function iteration We can rewrite (5.38) to make it a statement about functions rather than random variables In particular, consider the functional equation u c ( a, z) = max } ( z, d z ) , u ( Ra + z + b) u c { Ra + z c( a, z), z (5.40)

where := R and u c(s) := u (c(s)) Equation (5.40) is a functional equation in c In order to identify a solution, let C be the set of candidate consumption functions c : S R such that each c C is continuous and (weakly) increasing min Z c( a, z) Ra + z + b for all ( a, z) S In addition, let K : C C be dened as follows: For given c C , the value Kc( a, z) is the unique t J ( a, z) that solves u (t) = max where J ( a, z) := {t R : min Z t Ra + z + b} We refer to K as Colemans policy function operator [Coleman1990] It is known that K is a contraction mapping on C under the metric ( c, d ) : = u cu d := sup | u (c(s)) u (d(s)) |
sS

} ( z, d z ) , u ( Ra + z + b) u c { Ra + z t, z

(5.41)

(5.42)

( c, d C )

The metric is complete on C Convergence in implies uniform convergence on compacts

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.5. OPTIMAL SAVINGS

325

In consequence, K has a unique xed point c C and K n c c as n for any c C By the denition of K, the xed points of K in C coincide with the solutions to (5.40) in C In particular, it can be shown that the path {ct } generated from ( a0 , z0 ) S using policy function c is the unique optimal path from ( a0 , z0 ) S TL;DR The unique optimal policy can be computed by picking any c C and iterating with the operator K dened in (5.41) Value function iteration The Bellman operator for this problem is given by Tv( a, z) = max u(c) + ) ( z, d z ) v( Ra + z c, z (5.43)

0c Ra+z+b

We have to be careful with VFI (i.e., iterating with T ) in this setting because u is not assumed to be bounded In fact typically unbounded both above and below e.g. u(c) = log c In which case, the standard DP theory does not apply T n v not guaranteed to converge to the value function for arbitrary continous bounded v Nonetheless, we can always try the strategy iterate and hope In this case we can check the outcome by comparing with PFI The latter is known to converge, as described above Implementation The following code provides implementations of both VFI and PFI le ifp.py, provided in the main repository Description and clarications are given below
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: ifp.py Authors: John Stachurski and Thomas Sargent LastModified: 11/08/2013 Functions for solving the income fluctuation problem. Iteration with either the Coleman or Bellman operators from appropriate initial conditions leads to convergence to the optimal consumption policy. The income process is a finite state Markov chain. Note that the Coleman operator is the preferred method, as it is almost always faster and more accurate. The Bellman operator is only provided for comparison. """ import numpy as np from scipy.optimize import fminbound, brentq from scipy import interp

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.5. OPTIMAL SAVINGS

326

class consumerProblem: """ This class is just a "struct" to hold the collection of parameters defining the consumer problem. """ def __init__(self, r=0.01, beta=0.96, Pi=((0.6, 0.4), (0.05, 0.95)), z_vals=(0.5, 1.0), b=0, grid_max=16, grid_size=50, u=np.log, du=lambda x: 1/x): """ Parameters: * * * * * * r and beta are scalars with r > 0 and (1 + r) * beta < 1 Pi is a 2D NumPy array --- the Markov matrix for {z_t} z_vals is an array/list containing the state space of {z_t} u is the utility function and du is the derivative b is the borrowing constraint grid_max and grid_size describe the grid used in the solution

""" self.u, self.du = u, du self.r, self.R = r, 1 + r self.beta, self.b = beta, b self.Pi, self.z_vals = np.array(Pi), tuple(z_vals) self.asset_grid = np.linspace(-b, grid_max, grid_size) def bellman_operator(cp, V, return_policy=False): """ The approximate Bellman operator, which computes and returns the updated value function TV (or the V-greedy policy c if return_policy == True). Parameters: * cp is an instance of class consumerProblem * V is a NumPy array of dimension len(cp.asset_grid) x len(cp.z_vals) """ # === simplify names, set up arrays === # R, Pi, beta, u, b = cp.R, cp.Pi, cp.beta, cp.u, cp.b asset_grid, z_vals = cp.asset_grid, cp.z_vals new_V = np.empty(V.shape) new_c = np.empty(V.shape) z_index = range(len(z_vals)) # === linear interpolation of V along the asset grid === #

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.5. OPTIMAL SAVINGS

327

vf = lambda a, i_z: interp(a, asset_grid, V[:, i_z]) # === solve r.h.s. of Bellman equation === # for i_a, a in enumerate(asset_grid): for i_z, z in enumerate(z_vals): def obj(c): # objective function to be *minimized* y = sum(vf(R * a + z - c, j) * Pi[i_z, j] for j in z_index) return - u(c) - beta * y c_star = fminbound(obj, np.min(z_vals), R * a + z + b) new_c[i_a, i_z], new_V[i_a, i_z] = c_star, -obj(c_star) if return_policy: return new_c else: return new_V def coleman_operator(cp, c): """ The approximate Coleman operator. Iteration with this operator corresponds to policy function iteration. Computes and returns the updated consumption policy c. Parameters: * cp is an instance of class consumerProblem * c is a NumPy array of dimension len(cp.asset_grid) x len(cp.z_vals) The array c is replaced with a function cf that implements univariate linear interpolation over the asset grid for each possible value of z. """ # === simplify names, set up arrays === # R, Pi, beta, du, b = cp.R, cp.Pi, cp.beta, cp.du, cp.b asset_grid, z_vals = cp.asset_grid, cp.z_vals z_size = len(z_vals) gamma = R * beta vals = np.empty(z_size) # === linear interpolation to get consumption function === # def cf(a): """ The call cf(a) returns an array containing the values c(a, z) for each z in z_vals. For each such z, the value c(a, z) is constructed by univariate linear approximation over asset space, based on the values in the array c """ for i in range(z_size): vals[i] = interp(a, cp.asset_grid, c[:, i]) return vals # === solve for root to get Kc === # Kc = np.empty(c.shape) for i_a, a in enumerate(asset_grid):

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.5. OPTIMAL SAVINGS

328

for i_z, z in enumerate(z_vals): def h(t): expectation = np.dot(du(cf(R * a + z - t)), Pi[i_z, :]) return du(t) - max(gamma * expectation, du(R * a + z + b)) Kc[i_a, i_z] = brentq(h, np.min(z_vals), R * a + z + b) return Kc def initialize(cp): """ Creates a suitable initial conditions V and c for value function and policy function iteration respectively. * cp is an instance of class consumerProblem. """ # === simplify names, set up arrays === # R, beta, u, b = cp.R, cp.beta, cp.u, cp.b asset_grid, z_vals = cp.asset_grid, cp.z_vals shape = len(asset_grid), len(z_vals) V, c = np.empty(shape), np.empty(shape) # === populate V and c === # for i_a, a in enumerate(asset_grid): for i_z, z in enumerate(z_vals): c_max = R * a + z + b c[i_a, i_z] = c_max V[i_a, i_z] = u(c_max) / (1 - beta) return V, c

The code contains the following denitions class consumerProblem, which produces struct-type objects that contain all the relevant parameters of a given model function bellman_operator(), which implements the Bellman operator T specied above function coleman_operator(), which implements the Coleman operator K specied above function initialize(), which generates suitable initial conditions for iteration The functions bellman_operator() and coleman_operator() both use linear interpolation along the asset grid to approximate the value and consumption functions The following exercises walk you through several applications where policy functions are computed In exercise 1 you will see that while VFI and PFI produce similar results, the latter is much faster Because we are exploiting analytically derived rst order conditions Another benet of working in policy function space rather than value function space is that value functions typically have more curvature Makes them harder to approximate numerically

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.5. OPTIMAL SAVINGS

329

Exercises
Exercise 1 The rst exercise is to replicate the following gure, which compares PFI and VFI as solution methods

The gure shows consumption policies computed by iteration of K and T respectively In the case of iteration with T , the nal value function is used to compute the observed policy Consumption is shown as a function of assets with income z held xed at its smallest value The following details are needed to replicate the gure The parameters are the default parameters in the denition of consumerProblem The initial conditions are the default ones from initialize() Both operators are iterated 80 times When you run your code you will observe that iteration with K is faster than iteration with T If you are using IPython, a comparison of the operators can be made as follows
In [12]: run ifp In [13]: cp = consumerProblem() In [14]: v, c = initialize(cp) In [15]: timeit bellman_operator(cp, v) 10 loops, best of 3: 157 ms per loop

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.5. OPTIMAL SAVINGS

330

In [16]: timeit coleman_operator(cp, c) 10 loops, best of 3: 29.5 ms per loop

The output shows that Coleman operator is about 5 times faster From now on we will only use the Coleman operator Solution: View solution Exercise 2 Next lets consider how the interest rate affects consumption Reproduce the following gure, which shows (approximately) optimal consumption policies for different interest rates

Other than r, all parameters are at their default values r steps through np.linspace(0, 0.04, 4) Consumption is plotted against assets for income shock xed at the smallest value The gure shows that higher interest rates boost savings and hence suppress consumption Solution: View solution Exercise 3 Now lets consider the long run asset levels held by households Well take r = 0.03 and otherwise use default parameters The following gure is a 45 degree diagram showing the law of motion for assets when consumption is optimal

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.5. OPTIMAL SAVINGS

331

The green line and blue line represent the function a = h( a, z) := Ra + z c ( a, z) when income z takes its high and low values repectively The dashed line is the 45 degree line We can see from the gure that the dynamics will be stable assets do not diverge In fact there is a unique stationary distribution of assets that we can calculate by simulation Can be proved via theorem 2 of [HopenhaynPrescott1992] Represents the long run dispersion of assets across households when households have idiosyncratic shocks Ergodicity is valid here, so stationary probabilities can be calculated by averaging over a single long time series Hence to approximate the stationary distribution we can simulate a long time series for assets and histogram, as in the following gure Your task is to replicate the gure Parameters are as discussed above The histogram in the gure used a single time series { at } of length 500,000 Given the length of this time series, the initial condition ( a0 , z0 ) will not matter You might nd it helpful to use the module mc_sample in the main repository Note that the simulations will be relatively slow due to the inherent need for loops well talk about how to speed up this kind of code a bit later on T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.5. OPTIMAL SAVINGS

332

Solution: View solution Exercise 4 Following on from exercises 2 and 3, lets look at how savings and aggregate asset holdings vary with the interest rate Note: RMT3 section 18.6 can be consulted for more background on the topic treated in this exercise For a given parameterization of the model, the mean of the stationary distribution can be interpreted as aggregate capital in an economy with a unit mass of ex-ante identical households facing idiosyncratic shocks Lets look at how this measure of aggregate capital varies with the interest rate and borrowing constraint The next gure plots aggregate capital against the interest rate for b in (1, 3) As is traditional, the price (interest rate) is on the vertical axis The horizontal axis is aggregate capital computed as the mean of the stationary distribution Exercise 4 is to replicate the gure, making use of code from previous exercises Try to explain why the measure of aggregate capital is equal to b when r = 0 for both cases shown here Solution: View solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

333

5.6 Robustness
Overview
This lecture modies a Bellman equation to express a decision makers doubts about transition dynamics His specication doubts make the decision maker want a robust decision rule Robust means insensitive to misspecication of transition dynamics The decision maker has a single approximating model He calls it approximating to acknowledge that he doesnt completely trust it He fears that outcomes will actually be determined by another model that he cannot describe explicitly All that he knows is that the actual data-generating model is in some (uncountable) set of models that surrounds his approximating model He quanties the discrepancey between his approximating model and the genuine datagenerating model by using a quantity called entropy (Well explain what entropy means below) He wants a decision rule that will work well enough no matter which of those other models actually governs outcomes This is what it means for his decision rule to be robust to misspecication of the approximating model This may sound like too much to ask for, but . . . . . . a secret weapon is available to design robust decision rules

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

334

The secret weapon is max-min control theory A value-maximizing decision maker enlists the aid of an (imaginary) value-minimizing decision maker to construct bounds on the value attained by a given decision rule under different models of the transition dynamics The original decision maker uses those bounds to construct a decision rule with an assured performance level, no matter which model actually governs outcomes Note: In reading this lecture, please dont think that our decision theory is paranoid when he conducts a worst-case analysis. By designing a rule that works well against a worst-case, his intention is to construct a rule that will work well across a set of models.

Sets of Models Imply Sets Of Values Our robust decision maker wants to know how well a given rule will work when there is not a single transition law. . . . . . . he wants to know a set of values that will be attained under a set of transition laws for a given decision rule F Ultimately, he wants to design a rule F that shapes a set of values in ways that he prefers With this in mind, consider the following graph, which relates to a particular decision problem explained below

The gure shows a value-entropy correspondence for a particular decision rule F The shaded set is the graph of the correspondence, which maps entropy to a set of values associated with a set of models that surround the decision makers approximating model Here T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.6. ROBUSTNESS

335

Value refers to a sum of discounted rewards obtained by following the decision rule F when the state starts at some xed initial state x0 Entropy is a nonnegative number that measures the size of a set of models surrounding the decision makers approximating model Entropy is zero when the decision maker completely trusts the approximating model Entropy is bigger, and the set of surrounding models is bigger, the less the decision maker trusts the approximating model The shaded region indicates that for all models having entropy less than or equal to the number on the horizontal axis, the value obtained will be somewhere within the indicated set of values Now lets compare two different decision rules, Fr and Fb In the next gure, The red set shows the value-entropy correspondence for decision rule Fr The blue set shows the value-entropy correspondence for decision rule Fb

The blue correspondence is skinnier than the red correspondence This conveys the sense in which the decision rule Fb is more robust than the decision rule Fr more robust means that the set of values is less sensitive to increasing misspecication as measured by entropy Notice that the less robust rule Fr promises higher values for small misspecications (small entropy) Below well explain in detail how to construct these sets of values for a given F, but for now . . . Here is a hint about the secret weapons well use to construct these sets T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.6. ROBUSTNESS

336

Well use some min-max problems to construct the lower bounds Well use some max-max problems to construct the upper bounds We will also describe how to choose F to shape the sets of values Inspiring Video If you want to understand more about why one serious quantitative researcher is interested in this approach, we recommend Lars Peter Hansens Nobel lecture Other References Our discussion in this lecture is based on [HansenSargent2000] [HansenSargent2008]

The Model
For simplicity, we present ideas in the context of a class of problems with linear transition laws and quadratic objective function To t in with our earlier lecture on LQ control, we will treat loss minimization rather than value maximization To begin, recall the innite horizon LQ problem, where an agent chooses a sequence of controls {ut } to minimize
t =0

xt Rxt + ut Qut

(5.44)

subject to the linear law of motion xt+1 = Axt + But + Cwt+1 , As before, xt is n 1, A is n n ut is k 1, B is n k wt is j 1, C is n j R is n n and Q is k k Here xt is the state, ut is the control, and wt is a shock vector. For now we take {wt } to be deterministic a single xed sequence We also allow for model uncertainty on the part of the agent solving this optimization problem In particular, the agent takes wt = 0 for all t as the benchmark case, but admits the possibility that this model might be wrong As a consequence, she also considers a set of alternative models expressed in terms of sequences {wt } that are relatively close to the zero sequence t = 0, 1, 2, . . . (5.45)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

337

She seeks a policy that will do well enough for a set of alternative models whose members are pinned down by sequences {wt } Soon well quantify the quality of a model specication in terms of the maximal size of the exprest +1 w sion t =0 t +1 w t +1

Constructing More Robust Policies


If our agent takes {wt } as a given deterministic sequence, then, drawing on intuition from earlier lectures on dynamic programming, we can anticipate Bellman equations such as Jt1 ( x ) = min{ x Rx + u Qu + Jt ( Ax + Bu + Cwt+1 )}
u

(Here J depends on t because the sequence {wt } is not recursive) Our tool for studying robustness is to construct a rule that works well even if a perverse sequence {wt } occurs In our framework, perverse means loss increasing As well see, this will eventually lead us to construct the Bellman equation J ( x ) = min max{ x Rx + u Qu + [ J ( Ax + Bu + Cw) w w]}
u w

(5.46)

Notice that weve added the penalty term w w Since w w = w 2 , this term becomes inuential when w moves away from the origin The penalty parameter controls how much we penalize the maximizing agent for harming the minmizing agent By raising more and more, we more and more limit the ability of maximizing agent to distort outcomes relative to the approximating model So bigger s are implicitly associated with smaller distortion sequences wt = 0, t Analyzing the Bellman equation So what does J in (5.46) look like? As with the ordinary LQ control model, J takes the form J ( x ) = x Px for some symmetric positive denite matrix P One of our main tasks will be to analyze and compute the matrix P First, using matrix calculus, you will be able to verify that max{( Ax + Bu + Cw) P( Ax + Bu + Cw) w w}
w

= ( Ax + Bu) D ( P)( Ax + Bu)


where

D ( P) := P + PC ( I C PC )1 C P

(5.47)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

338

and I is a j j identity matrix. Substituting the expression for the maximum into (5.46) yields x Px = min{ x Rx + u Qu + ( Ax + Bu) D ( P)( Ax + Bu)}
u

(5.48)

Using similar arguments, the solution to this minimization problem is u = Fx where F := ( Q + B D ( P ) B ) 1 B D ( P ) A Substituting this minimizer back into (5.48) and working through the algebra gives x Px = x B (D ( P)) x for all x, or, equivalently, P = B (D ( P)) where D is the operator dened in (5.47) and

B ( P) := R 2 A PB( Q + B PB)1 B PA + A PA
The operator B is the standard (i.e., non-robust) LQ Bellman operator, and P = B ( P) is the standard Bellman equation see this discussion Under some regularity conditions (see [HansenSargent2008]), the operator B D has a unique positive denite solution, which we denote below by P where A robust policy, indexed by is u = Fx := ( Q + B D ( P ) B ) 1 B D ( P )A F We also dene := ( I C PC ) 1 C P ( A BF ) K (5.49) (5.50)

is that wt+1 = Kx t on the worst-case path, in the sense that this vector is The interpretation of K the maximizer of (??) evaluated at u = Fx , F , K are all determined by the primitives and Note that P Note also that if is very large, then D is approximately equal to the identity mapping and F are approximately equal to their standard LQ Bellman solution Hence, when is large, P and policy function forms is approximately equal to zero Furthermore, when is large, K Conversely, smaller are to be associated with greater fear of model misspecication, and greater concern for robustness

Robustness as Outcome of a Two-Person Zero-Sum Game


It is helpful to have a second interpretation of our results , K are Nash This time we frame the problem as a two-person zero-sum game, and show that F equilibrium objects Agent 1 is our original agent, who seeks to minimize loss in the LQ program while admitting the possibility of misspecication Agent 2 is an imaginary malevolent player T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.6. ROBUSTNESS

339

Agent 2s malevolence helps the original agent uses to compute bounds on his value function across a set of models We begin with agent 2s problem Agent 2s Problem Agent 2 1. observes a xed policy F specifying the behavior of agent 1, in the sense that ut = Fxt for all t 2. responds by choosing a shock sequence {wt } from a set of paths sufciently close to the benchmark sequence {0, 0, 0, . . .} A natural way to say sufciently close to the zero sequence is to restrict the summed inner product t=1 wt wt to be small However, to obtain a recusive formulation it turns out to be convenient to use a kind of discounted inner product, leading to
t =1

t wt wt

(5.51)

Now let F be a xed policy, and let JF ( x0 , w) be the present-value cost of that policy given sequence w := {wt } and initial condition x0 Rn Substituting Fxt for ut in (5.44), this value can be written as JF ( x0 , w ) : = where xt+1 = ( A BF ) xt + Cwt+1 and the initial condition x0 is as specied in the left-hand side of (5.52) Agent 2 chooses w to maximize agent 1s loss JF ( x0 , w) subject to (5.51) Using a Lagrangian formulation, we can express this problem as max t xt ( R + F QF ) xt (wt+1 wt+1 )
w t =0

t =0

t xt ( R + F QF)xt

(5.52)

(5.53)

where { xt } is given by (5.53) and is the Lagrange multiplier For the moment lets take as xed, allowing us to drop the constant term in the objective function, and hence write the problem as max t xt ( R + F QF ) xt wt+1 wt+1
w t =0

or, equivalently,
w

min t xt ( R + F QF ) xt + wt+1 wt+1


t =0

(5.54)

subject to (5.53) T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.6. ROBUSTNESS

340

Whats striking about this optimization problem is that it is once again an LQ problem, with w = {wt } as the sequence of controls The expression for the optimal policy can be found by applying the usual LQ formula (see here) We denote it by K ( F, ), with the interpretation wt+1 = K ( F, ) xt The remaining step for agent 2s problem is to enforce the constraint (5.51), which can be done by choosing = such that t xt K ( F, ) K ( F, ) xt =
t =0

(5.55)

Here xt is given by (5.53) which in this case becomes xt+1 = ( A BF + CK ( F, )) xt Using Agent 2s Problem to Construct Bounds on the Value Sets The Lower Bound Dene the minimized object on the right side of problem (5.54) as R ( x0 , F ). Because minimizers minimize we have R ( x0 , F )

t =0

xt ( R + F QF ) xt + t wt+1 wt+1 ,
t =0

where xt+1 = ( A BF + CK ( F, )) xt and x0 is a given initial condition. This inequality in turn implies the inequality R ( x0 , F ) ent where

t =0

xt ( R + F QF ) xt

(5.56)

ent t wt+1 wt+1


t =0

The left side of inequality (5.56) is a straight line with slope Technically, it is an example of a separating hyperplane At a particular value of entropy, the line is tangent to the lower bound of values as a function of entropy In particular, the lower bound on the left side of (5.56) is attained when ent = t xt K ( F, ) K ( F, ) xt
t =0

(5.57)

To construct the lower bound on the set of values associated with all perturbations w satisfying the entropy constraint at a given entropy level, we proceed as follows: For a given , solve the minimization problem (5.54) Compute the minimizer R ( x0 , F ) and the associated entropy using (5.57) Compute the lower bound on the value function R ( x0 , F ) ent and plot it against ent T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.6. ROBUSTNESS

341

Repeat the preceding three steps for a range of values of to trace out the lower bound Note: This procedure sweeps out a set of separating hyperplanes indexed by different values for the Lagrange multiplier

The Upper Bound To construct an upper bound we use a very similar procedure We simply replace the minimization problem (5.54) with the maximization problem
t w t +1 w t +1 V xt ( R + F QF ) xt ( x0 , F ) = max w t =0

(5.58)

> 0 penalizes the choice of w with larger entropy. where now = in problem (5.54)) (Notice the Because maximizers maximize we have V ( x0 , F ) t w t +1 w t +1 t xt ( R + F QF)xt
t =0

t =0

which in turn implies the inequality ent V ( x0 , F ) + where

t =0

xt ( R + F QF ) xt

(5.59)

ent t wt+1 wt+1


t =0

The left side of inequality (5.59) is a straight line with slope The upper bound on the left side of (5.59) is attained when ) K ( F, ) xt ent = t xt K ( F,
t =0

(5.60)

To construct the upper bound on the set of values associated all perturbations w with a given entropy we proceed much as we did for the lower bound , solve the maximization problem (5.58) For a given Compute the maximizer V ( x0 , F ) and the associated entropy using (5.60) ent and plot it against ent Compute the upper bound on the value function V ( x0 , F ) + to trace out the upper bound Repeat the preceding three steps for a range of values of Now in the interest of reshaping these sets of values by choosing F, we turn to agent 1s problem

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

342

Agent 1s Problem Now we turn to agent 1, who solves min t xt Rxt + ut Qut wt+1 wt+1
{ u t } t =0

(5.61)

taking {wt } as given by wt+1 = Kxt In other words, agent 1 minimizes

t =0

xt ( R K K ) xt + ut Qut

(5.62)

subject to xt+1 = ( A + CK ) xt + But Once again, the expression for the optimal policy can be found here we denote it by F we have obtained depends on K, which, in agent 2s problem, Nash Equilibrium Clearly the F depended on an initial policy F Holding all other parameters xed, we can represent this relationship as a mapping , where = (K ( F, )) F The map F (K ( F, )) corresponds to the procedure where 1. agent 1 uses an arbitrary initial policy F 2. agent 2 best responds to agent 1 by choosing K ( F, ) = (K ( F, )) 3. agent 1 best responds to agent 2 by choosing F dened in (5.49) is a xed point of this As you may have already guessed, the robust policy F mapping In particular, for any given , , ) = K , where K is as given in (5.50) 1. K ( F ) = F 2. (K A sketch of the proof is given in the appendix (5.63)

The Stochastic Case


Now we turn to the stochastic case, where the sequence {wt } is treated as an iid sequence of random vectors In this setting, we suppose that our agent is uncertain about the distribution of wt The agent takes the standard normal distribution N (0, I ) as the baseline, while admitting the possibility that other nearby distributions might in fact be correct To implement this idea, we need a notion of what it means for one distribution to be near another one

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

343

Here we adopt a very common measure of closeness for distributions known as the relative entropy, or Kullback-Leibler divergence For densities p, q, the Kullback-Leibler divergence of q from p is dened to be DKL ( p, q) := ln p( x ) p( x ) dx q( x )

Using this notation, we replace (5.46) with the stochastic analog J ( x ) = min max x Rx + u Qu +
u P

J ( Ax + Bu + Cw) (dw) DKL (, )

(5.64)

Here P represents the set of all densities on Rn and is the benchmark distribution N (0, I ) The distribution is chosen as the least desirable distribution in terms of next period outcomes, while taking into account the penalty term DKL (, ) This penalty term plays a role analogous to the one played by the deterministic penalty wt wt in (5.46), since it discourages large deviations from the benchmark Solving the Model The maximization problem in (5.64) appears highly nontrivial after all, we are maximizing over an innite dimensional space consisting of the entire set of densities However, it turns out that the solution is fully tractable, and in fact also falls within the class of normal distributions First, we note that J has the form J ( x ) = x Px + d for some positive denite matrix P and constant real number d Moreover, it turns out that if ( I 1 C PC )1 is nonsingular, then max
P

( Ax + Bu + Cw) P( Ax + Bu + Cw) (dw) DKL (, ) = ( Ax + Bu) D ( P)( Ax + Bu) + ( , P)

where

( , P) := ln[det( I 1 C PC )1 ]

and the maximizer is the Gaussian distribution = N ( I C PC )1 C P( Ax + Bu), ( I 1 C PC )1 (5.65)

Substituting the expression for the maximum into the Bellman equation (5.64) and using J ( x ) = x Px + d gives x Px + d = min x Rx + u Qu + ( Ax + Bu) D ( P)( Ax + Bu) + [d + ( , P)]
u

(5.66)

Since constant terms do not affect minimizers, the solution is the same as (5.48), leading to x Px + d = x B (D ( P)) x + [d + ( , P)] T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.6. ROBUSTNESS

344

to be the positive denite xed point of B D To solve this Bellman equation, we take P as the real number solving d = [d + ( , P)], which is In addition, we take d := d ( , P) 1 (5.67)

The robust policy in this stochastic case is the minimizer in (5.66), which is once again u = Fx given by (5.49) for F Substituting the robust policy into (5.65) we obtain the worst case shock distribution: t , ( I 1 C PC ) 1 ) wt+1 N (Kx is given by (5.50) where K Note that the mean of the worst-case shock distribution is equal to the value of the worst-case wt+1 from the deterministic setting Computing Other Quantities Before turning to implementation, we briey outline how to compute several other quantities of interest Worst-Case Value of a Policy One thing we will be interested in doing is holding a policy xed and computing the discounted loss associated with that policy So let F be a given policy and let JF ( x ) be the associated loss, which, by analogy with (5.64), satises JF ( x ) = max x ( R + F QF ) x +
P

JF (( A BF ) x + Cw) (dw) DKL (, )

Writing JF ( x ) = x PF x + d F and applying the same argument used to derive (??) we get x PF x + d F = x ( R + F QF ) x + x ( A BF ) D ( PF )( A BF ) x + d F + ( , PF ) To solve this we take PF to be the solution of P = R + F QF + ( A BF ) D ( P)( A BF ) and d F := ( , PF ) = ln[det( I 1 C PF C )1 ] 1 1 (5.68)

If you skip ahead to the appendix, you will be able to verify that PF is the solution to the Bellman equation in agent 2s problem discussed above we use this in our computations

Implementation
The following is an implementation of robust LQ optimal control

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

345

""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: robustlq.py Authors: Chase Coleman, Spencer Lyon, Thomas Sargent, John Stachurski LastModified: 28/01/2014 Solves robust LQ control problems. """ from __future__ import division # Remove for Python 3.sx import numpy as np from lqcontrol import LQ from quadsums import var_quadratic_sum from numpy import dot, log, sqrt, identity, hstack, vstack, trace from scipy.linalg import solve, inv, det, solve_discrete_lyapunov class RBLQ: """ Provides methods for analysing infinite horizon robust LQ control problems of the form min_{u_t} subject to x_{t+1} = A x_t + B u_t + C w_{t+1} and with model misspecification parameter theta. """ def __init__(self, Q, R, A, B, C, beta, theta): """ Sets up the robust control problem. Parameters ========== Q, R : array_like, dtype = float The matrices R and Q from the objective function A, B, C : array_like, dtype = float The matrices A, B, and C from the state space system beta, theta : scalar, float The discount and robustness factors in the robust control problem We assume that * * * * * R Q A B C is is is is is n k n n n x x x x x n, symmetric and nonnegative definite k, symmetric and positive definite n k j sum_t beta^t {x_t' R x_t + u'_t Q u_t }

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

346

""" # == Make sure all matrices can be treated as 2D arrays == # A, B, C, Q, R = map(np.atleast_2d, (A, B, C, Q, R)) self.A, self.B, self.C, self.Q, self.R = A, B, C, Q, R # == Record dimensions == # self.k = self.Q.shape[0] self.n = self.R.shape[0] self.j = self.C.shape[1] # == Remaining parameters == # self.beta, self.theta = beta, theta def d_operator(self, P): """ The D operator, mapping P into D(P) := P + PC(theta I - C'PC)^{-1} C'P. Parameters ========== P : array_like A self.n x self.n array """ C, theta = self.C, self.theta I = np.identity(self.j) S1 = dot(P, C) S2 = dot(C.T, S1) return P + dot(S1, solve(theta * I - S2, S1.T)) def b_operator(self, P): """ The B operator, mapping P into B(P) := R - beta^2 A'PB (Q + beta B'PB)^{-1} B'PA + beta A'PA and also returning F := (Q + beta B'PB)^{-1} beta B'PA Parameters ========== P : array_like An self.n x self.n array """ A, B, Q, R, beta = self.A, self.B, self.Q, self.R, self.beta S1 = Q + beta * dot(B.T, dot(P, B)) S2 = beta * dot(B.T, dot(P, A)) S3 = beta * dot(A.T, dot(P, A)) F = solve(S1, S2) new_P = R - dot(S2.T, solve(S1, S2)) + S3 return F, new_P

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

347

def robust_rule(self): """ This method solves the robust control problem by tricking it into a stacked LQ problem, as described in chapter 2 of Hansen-Sargent's text "Robustness." The optimal control with observed state is u_t = - F x_t And the value function is -x'Px Returns ======= F : array_like, dtype = float The optimal control matrix from above above P : array_like, dtype = float The psoitive semi-definite matrix defining the value function K : array_like, dtype = float the worst-case shock matrix K, where :math:`w_{t+1} = K x_t` is the worst case shock """ # == Simplify names == # A, B, C, Q, R = self.A, self.B, self.C, self.Q, self.R beta, theta = self.beta, self.theta k, j = self.k, self.j # == Set up LQ version == # I = identity(j) Z = np.zeros((k, j)) Ba = hstack([B, C]) Qa = vstack([hstack([Q, Z]), hstack([Z.T, -beta*I*theta])]) lq = LQ(Qa, R, A, Ba, beta=beta) # == Solve and convert back to robust problem == # P, f, d = lq.stationary_values() F = f[:k, :] K = -f[k:f.shape[0], :] return F, K, P def robust_rule_simple(self, P_init=None, max_iter=80, tol=1e-8): """ A simple algorithm for computing the robust policy F and the corresponding value function P, based around straightforward iteration with the robust Bellman operator. This function is easier to understand but one or two orders of magnitude slower than self.robust_rule(). For more information see the docstring of that method. """ # == Simplify names == # A, B, C, Q, R = self.A, self.B, self.C, self.Q, self.R beta, theta = self.beta, self.theta # == Set up loop == # P = np.zeros((self.n, self.n)) if not P_init else P_init

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

348

iterate, e = 0, tol + 1 while iterate < max_iter and e > tol: F, new_P = self.b_operator(self.d_operator(P)) e = np.sqrt(np.sum((new_P - P)**2)) iterate += 1 P = new_P I = np.identity(self.j) S1 = P.dot(C) S2 = C.T.dot(S1) K = inv(theta * I - S2).dot(S1.T).dot(A - B.dot(F)) return F, K, P def F_to_K(self, F): """ Compute agent 2's best cost-minimizing response K, given F. Parameters ========== F : array_like A self.k x self.n array Returns ======= K : array_like, dtype = float P : array_like, dtype = float """ Q2 = self.beta * self.theta R2 = - self.R - dot(F.T, dot(self.Q, F)) A2 = self.A - dot(self.B, F) B2 = self.C lq = LQ(Q2, R2, A2, B2, beta=self.beta) P, neg_K, d = lq.stationary_values() return - neg_K, P def K_to_F(self, K): """ Compute agent 1's best value-maximizing response F, given K. Parameters ========== K : array_like A self.j x self.n array Returns ======= F : array_like, dtype = float P : array_like, dtype = float """ A1 = self.A + dot(self.C, K) B1 = self.B Q1 = self.Q

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

349

R1 = self.R - self.beta * self.theta * dot(K.T, K) lq = LQ(Q1, R1, A1, B1, beta=self.beta) P, F, d = lq.stationary_values() return F, P def compute_deterministic_entropy(self, F, K, x0): """ Given K and F, compute the value of deterministic entropy, which is sum_t beta^t x_t' K'K x_t with x_{t+1} = (A - BF + CK) x_t. """ H0 = dot(K.T, K) C0 = np.zeros((self.n, 1)) A0 = self.A - dot(self.B, F) + dot(self.C, K) e = var_quadratic_sum(A0, C0, H0, self.beta, x0) return e def evaluate_F(self, F): """ Given a fixed policy F, with the interpretation u = -F x, this function computes the matrix P_F and constant d_F associated with discounted cost J_F(x) = x' P_F x + d_F. Parameters ========== F : array_like A self.k x self.n array Returns ======= P_F : array_like, dtype = float Matrix for discounted cost d_F : scalar Constant for discounted cost K_F : array_like, dtype = float Worst case policy O_F : array_like, dtype = float Matrix for discounted entropy o_F : scalar Constant for discounted entropy """ # == Simplify names == # Q, R, A, B, C = self.Q, self.R, self.A, self.B, self.C beta, theta = self.beta, self.theta # == Solve for policies and costs using agent 2's problem == # K_F, neg_P_F = self.F_to_K(F) P_F = - neg_P_F I = np.identity(self.j)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

350

H = inv(I - C.T.dot(P_F.dot(C)) / theta) d_F = log(det(H)) # == Compute O_F and o_F == # sig = -1.0 / theta AO = sqrt(beta) * (A - dot(B, F) + dot(C, K_F)) O_F = solve_discrete_lyapunov(AO.T, beta * dot(K_F.T, K_F)) ho = (trace(H - 1) - d_F) / 2.0 tr = trace(dot(O_F, C.dot(H.dot(C.T)))) o_F = (ho + beta * tr) / (1 - beta) return K_F, P_F, d_F, O_F, o_F

Here is a brief description of the methods of the class RBLQ.d_operator() and RBLQ.d_operator() implement D and B respectively , K , P , as RBLQ.robust_rule() and RBLQ.robust_rule_simple() both solve for the triple F described in equations (5.49) (5.50) and the surrounding discussion RBLQ.robust_rule() is more efcient RBLQ.robust_rule_simple() is more transparent and easier to follow RBLQ.K_to_F() and RBLQ.F_to_K() solve the decision problems of agent 1 and agent 2 respectively RBLQ.compute_deterministic_entropy() computes the left-hand side of (5.55) RBLQ.evaluate_F() computes the loss and entropy associated with a given policy see this discussion

Application
Let us consider a monopolist similar to this one, but now facing model uncertainty The inverse demand function is pt = a0 a1 yt + dt where dt+1 = dt + d wt+1 , The period return function for the monopolist is rt = pt yt

{wt } N (0, 1)

iid

( y t +1 y t )2 cyt 2

Its objective is to maximize expected discounted prots, or, equivalently, to minimize t E t =0 ( r t ) To form a linear regulator problem, we take the state and control to be 1 xt = yt and ut = yt+1 yt dt

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

351

Setting b := ( a0 c)/2 we dene 0 b 0 R = b a1 1/2 0 1/2 0 For the transition matrices we set 1 0 0 A = 0 1 0 , 0 0 and Q = /2

0 B = 1 , 0

0 C =0 d

Our aim is to compute the value-entropy correspondences shown above The parameters are a0 = 100, a1 = 0.5, = 0.9, d = 0.05, = 0.95, c = 2, = 50.0 The standard normal distribution for wt is understood as the agents baseline, with uncertainty parameterized by We compute value-entropy correspondences for two policies 1. The no concern for robustness policy F0 , which is the ordinary LQ loss minimizer 2. A moderate concern for robustness policy Fb , with = 0.02 The following code produces the graph shown above, with blue being for the robust policy
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: robust_monopolist.py Authors: Chase Coleman, Spencer Lyon, Thomas Sargent, John Stachurski LastModified: 28/01/2014 The robust control problem for a monopolist with adjustment costs. The inverse demand curve is: p_t = a_0 - a_1 y_t + d_t where d_{t+1} = \rho d_t + \sigma_d w_{t+1} for w_t ~ N(0,1) and iid. The period return function for the monopolist is r_t = p_t y_t - gamma (y_{t+1} - y_t)^2 / 2 - c y_t

The objective of the firm is E_t \sum_{t=0}^\infty \beta^t r_t For the linear regulator, we take the state and control to be x_t = (1, y_t, d_t) and u_t = y_{t+1} - y_t """ from __future__ import division from robustlq import RBLQ

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

352

from lqcontrol import LQ import pandas as pd import numpy as np from scipy.linalg import eig from scipy import interp import matplotlib.pyplot as plt # == model parameters == # a_0 a_1 rho sigma_d beta c gamma = = = = = = = 100 0.5 0.9 0.05 0.95 2 50.0

theta = 0.002 ac = (a_0 - c) / 2.0 # == Define LQ matrices == # R = np.array([[0, ac, 0], [ac, -a_1, 0.5], [0., 0.5, 0]]) R = -R # For minimization Q = gamma / 2 A = np.array([[1., 0., 0.], [0., 1., 0.], [0., 0., rho]]) B = np.array([[0.], [1.], [0.]]) C = np.array([[0.], [0.], [sigma_d]]) #-----------------------------------------------------------------------------# # Functions #-----------------------------------------------------------------------------# def evaluate_policy(theta, F): """ Given theta (scalar, dtype=float) and policy F (array_like), returns the value associated with that policy under the worst case path for {w_t}, as well as the entropy level. """ rlq = RBLQ(Q, R, A, B, C, beta, theta) K_F, P_F, d_F, O_F, o_F = rlq.evaluate_F(F)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

353

x0 = np.array([[1.], [0.], [0.]]) value = - x0.T.dot(P_F.dot(x0)) - d_F entropy = x0.T.dot(O_F.dot(x0)) + o_F return map(float, (value, entropy)) def value_and_entropy(emax, F, bw, grid_size=1000): """ Compute the value function and entropy levels for a theta path increasing until it reaches the specified target entropy value. Parameters ========== emax : scalar The target entropy value F : array_like The policy function to be evaluated bw : str A string specifying whether the implied shock path follows best or worst assumptions. The only acceptable values are 'best' and 'worst'. Returns ======= df : pd.DataFrame A pandas DataFrame containing the value function and entropy values up to the emax parameter. The columns are 'value' and 'entropy'. """ if bw == 'worst': thetas = 1 / np.linspace(1e-8, 1000, grid_size) else: thetas = -1 / np.linspace(1e-8, 1000, grid_size) df = pd.DataFrame(index=thetas, columns=('value', 'entropy')) for theta in thetas: df.ix[theta] = evaluate_policy(theta, F) if df.ix[theta, 'entropy'] >= emax: break df = df.dropna(how='any') return df #-----------------------------------------------------------------------------# # Main #-----------------------------------------------------------------------------#

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

354

# == Compute the optimal rule == # optimal_lq = LQ(Q, R, A, B, C, beta) Po, Fo, do = optimal_lq.stationary_values() # == Compute a robust rule given theta == # baseline_robust = RBLQ(Q, R, A, B, C, beta, theta) Fb, Kb, Pb = baseline_robust.robust_rule() # == Check the positive definiteness of worst-case covariance matrix to == # # == ensure that theta exceeds the breakdown point == # test_matrix = np.identity(Pb.shape[0]) - np.dot(C.T, Pb.dot(C)) / theta eigenvals, eigenvecs = eig(test_matrix) assert (eigenvals >= 0).all(), 'theta below breakdown point.' emax = 1.6e6 optimal_best_case = value_and_entropy(emax, Fo, 'best') robust_best_case = value_and_entropy(emax, Fb, 'best') optimal_worst_case = value_and_entropy(emax, Fo, 'worst') robust_worst_case = value_and_entropy(emax, Fb, 'worst') fig, ax = plt.subplots() ax.set_xlim(0, emax) ax.set_ylabel("Value") ax.set_xlabel("Entropy") ax.grid() for axis in 'x', 'y': plt.ticklabel_format(style='sci', axis=axis, scilimits=(0,0)) plot_args = {'lw' : 2, 'alpha' : 0.7} colors = 'r', 'b' df_pairs = ((optimal_best_case, optimal_worst_case), (robust_best_case, robust_worst_case)) class Curve: def __init__(self, x, y): self.x, self.y = x, y def __call__(self, z): return interp(z, self.x, self.y) for c, df_pair in zip(colors, df_pairs): curves = [] for df in df_pair: # == Plot curves == # x, y = df['entropy'], df['value']

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.6. ROBUSTNESS

355

x, y = (np.asarray(a, dtype='float') for a in (x, y)) egrid = np.linspace(0, emax, 100) curve = Curve(x, y) print ax.plot(egrid, curve(egrid), color=c, **plot_args) curves.append(curve) # == Color fill between curves == # ax.fill_between(egrid, curves[0](egrid), curves[1](egrid), color=c, alpha=0.1) plt.show()

Heres another one, with = 0.002 instead of 0.02

Can you explain the different shape of the value-entropy correspondence for the robust policy?

Appendix
, ) = K , We sketch the proof only of the rst claim in this section, which is that, for any given , K ( F where K is as given in (5.50) This is the content of the next lemma is the xed point of the map B D and F is the robust policy as given in (5.49), then Lemma. If P , ) = ( I C PC ) 1 C P ( A BF ) K(F (5.69)

, the Bellman equation associated with the LQ Proof: As a rst step, observe that when F = F problem (5.53) (5.54) is = R F QF 2 ( A B F ) PC ( I + C PC ) 1 C P ( A BF ) + ( A B F ) P ( A BF ) (5.70) P

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

356

(revisit this discussion if you dont know where (5.70) comes from) and the optimal policy is ) 1 C P ( A BF ) xt wt+1 = ( I + C PC solves the Bellman equation (5.70) Suppose for a moment that P In this case the policy becomes ) 1 C P ( A BF ) xt wt+1 = ( I C PC which is exactly the claim in (5.69) solves (5.70), or, in other words, Hence it remains only to show that P = R+F QF + ( A B F ) PC ( I + C PC ) 1 C P ( A BF ) + ( A B F ) P ( A BF ) P Using the denition of D , we can rewrite the right-hand side more simply as QF + ( A B F ) D(P )( A B F ) R+F Although it involves a substantial amount of algebra, it can be shown that the latter is just P = B (D ( P ))) (Hint: Use the fact that P

5.7 Linear Stochastic Models


Overview
In this lecture we study linear stochastic processes, a class of models routinely used to study economic and nancial time series This class has the advantange of being 1. broad in terms of the kinds of dynamics it can represent 2. simple enough to be described by an elegant and comprehensive theory We consider linear stochastic models in both the time and frequency domain Computational topics include fast Fourier transforms, calculation of spectral densities, etc. For supplementary reading, see RMT3, chapter 2 [Sargent1987], chapter 11 John Cochranes notes on time series analysis, chapter 8 [Shiryaev1995], chapter 6 [CryerChan2008], all

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

357

Covariance Stationary Processes


Consider a sequence of random variables { Xt } indexed by t Z and taking values in R Thus, { Xt } begins in the innite past and extends to the innite future a convenient and standard assumption As in other elds, successful economic modeling typically requires identifying some deep structure in this process that is relatively constant over time If such structure can be found, then each new observation Xt , Xt+1 , . . . provides additional information about it which is how we learn from data For this reason, we will focus in what follows on processes that are stationary or become so after some transformation (differencing, cointegration, etc.) Denitions A real-valued stochastic process { Xt } is called covariance stationary if 1. Its mean := EXt does not depend on t 2. For all k in Z, the k-th autocovariance (k ) := E( Xt )( Xt+k ) is nite and depends only on k The function : Z R is called the autocovariance function of the process Throughout this lecture, we will work exclusively with zero-mean (i.e., = 0) covariance stationary processes The zero-mean assumption costs nothing in terms of generality, since working with non-zeromean processes involves no more than adding a constant Example 1: White Noise Perhaps the simplest class of covariance stationary processes is the white noise processes A process { t } is called a white noise process if 1. EXt = 0 2. (k ) = 2 1{k = 0} for some > 0 (Here 1{k = 0} is dened to be 1 if k = 0 and zero otherwise) Example 2: General Linear Processes From the simple building block provided by white noise, we can construct a very exible family of covariance stationary processes the general linear processes Xt = where { t } is white noise
2 {t } is a square summable sequence in R (that is, t=0 t < ) j =0

t j ,

tZ

(5.71)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

358

The sequence {t } is often called a linear lter With some manipulations it is possible to conrm that the autocovariance function for (5.71) is ( k ) = 2 j j+k
j =0

(5.72)

By the Cauchy-Schwartz inequality one can show that the last expression is nite. Clearly it does not depend on t Wolds Decomposition Remarkably, the class of general linear processes goes a long way towards describing the entire class of zero-mean covariance stationary processes In particular, Wolds theorem states that every zero-mean covariance stationary process { Xt } can be written as Xt = where { t } is white noise {t } is square summable t can be expressed as a linear function of Xt1 , Xt2 , . . . and is perfectly predictable over arbitrarily long horizons For intuition and further discussion, see [Sargent1987], p. 286 AR and MA General linear processes are a very broad class of processes, and it often pays to specialize to those for which there exists a representation having only nitely many parameters (In fact, experience shows that models with a relatively small number of parameters typically perform better than larger models, especially for forecasting) One very simple example of such a model is the AR(1) process X t = X t 1 +
t j =0

t j

+ t

where

|| < 1 and { t } is white noise


t j

(5.73)

j By direct substitution, it is easy to verify that Xt = j =0

Hence { Xt } is a general linear process Applying (5.72) to the previous expression for Xt , we get the AR(1) autocovariance function (k ) = k 2 , 1 2 k = 0, 1, . . . (5.74)

The next gure plots this function for = 0.8 and = 0.8 Another very simple process is the MA(1) process Xt = T HOMAS S ARGENT AND J OHN S TACHURSKI
t

t 1

February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

359

You will be able to verify that (0) = 2 (1 + 2 ), (1) = 2 , and (k) = 0

k > 1

The AR(1) can be general to an AR( p) and likewise for the MA(1) Putting all of this together, we get the ARMA Processes A stochastic process { Xt } is called an autoregressive moving average process, or ARMA( p, q), if it can be written as Xt = 1 Xt1 + + p Xt p + where { t } is white noise There is an alternative notation for ARMA processes in common use, based around the lag operator L Def. Given arbitrary variable Yt , let Lk Yt := Ytk It turns out that lag operators can lead to very succinct expressions for linear stochastic processes algebraic manipulations treating the lag operator as an ordinary scalar often are legitimate Using L, we can rewrite (5.75) as L0 Xt 1 L1 Xt p L p Xt = L0 If we let (z) and (z) be the polynomials (z) := 1 1 z p z p and ( z ) : = 1 + 1 z + + q z q (5.77)
t t

+ 1

t 1

+ + q

tq

(5.75)

+ 1 L1 t + + q L q

(5.76)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

360

then (5.76) simplies further to ( L ) Xt = ( L )


t

(5.78)

In what follows we always assume that the roots of the polynomial (z) lie outside the unit circle in the complex plane This condition is sufcient to guarantee that the ARMA( p, q) process falls within the class of general linear processes described above In particular, given an ARMA( p, q) process { Xt } satisfying this condition, there exists a square summable sequence {t } with Xt = j=0 j t j for all t The sequence {t } can be obtained by a recursive procedure outlined on page 79 of [CryerChan2008] In this context, the function t t is often called the impulse response function

Spectral Analysis
Autocovariance functions provide a great deal of infomation about covariance stationary processes In fact, for zero-mean Gaussian processes, the autocovariance function characterizes the entire joint distribution Even for non-Gaussian processes, it provides a signicant amount of information It turns out that there is an alternative representation of the autocovariance function of a covariance stationary process, called the spectral density At times, the spectral density is easier to derive, easier to manipulate and provides additional intuition Complex Numbers Before discussing the spectral density, we invite you to recall the main properties of complex numbers (or skip to the next section) It can be helpful to remember that, in a formal sense, complex numbers are just points ( x, y) R2 endowed with a specic notion of multiplication When ( x, y) is regarded as a complex number, x is called the real part and y is called the imaginary part The modulus or absolute value of a complex number z = ( x, y) is just its Euclidean norm in R2 , but is usually written as |z| instead of z The product of two complex numbers ( x, y) and (u, v) is dened to be ( xu vy, xv + yu), while addition is standard pointwise vector addition When endowed with these notions of multiplication and addition, the set of complex numbers forms a eld addition and multiplication play well together, just as they do in R The complex number ( x, y) is often written as x + iy, where i is called the imaginary unit, and is understood to obey i2 = 1

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

361

The x + iy notation can be thought of as a easy way to remember the denition of multiplication given above, because, proceeding naively,

( x + iy)(u + iv) = xu yv + i ( xv + yu)


Converted back to our rst notation, this becomes ( xu vy, xv + yu), which is the same as the product of ( x, y) and (u, v) from our previous denition Complex numbers are also sometimes expressed in their polar form rei , which should be interpreted as rei := r (cos( ) + i sin( )) Spectral Densities Let { Xt } be a covariance stationary process with autocovariance function satisfying k (k )2 < The spectral density f of { Xt } is dened as the discrete time Fourier transform of its autocovariance function f ( ) : = ( k ) e i k , R
k Z

(Some authors normalize the expression on the right by constants such as 1/ the chosen convention makes little difference provided you are consistent) Using the fact that is even, in the sense that (t) = (t) for all t, you should be able to show that f ( ) = (0) + 2 (k ) cos( k ) (5.79)
k 1

It is not difcult to conrm that f is real-valued even ( f ( ) = f ( ) ), and 2 -periodic, in the sense that f (2 + ) = f ( ) for all It follows that the values of f on [0, ] determine the values of f on all of R the proof is an exercise For this reason it is standard to plot the spectral density only on the interval [0, ] Example 1: White Noise Consider a white noise process { t } with standard deviation It is simple to check that in this case we have f ( ) = 2 . In particular, f is a constant function As we will see, this can be interpreted as meaning that all frequencies are equally present (White light has this property when frequency refers to the visible spectrum, a connection that provides the origins of the term white noise)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

362

Example 2: AR and MA and ARMA It is an exercise to show that the MA(1) process Xt = t1 + t has spectral density f ( ) = 2 (1 + 2 cos( ) + 2 ) (5.80)

With a bit more effort, its possible to show (see, e.g., p. 261 of [Sargent1987]) that the spectral density of the AR(1) process Xt = Xt1 + t is f ( ) = 2 1 2 cos( ) + 2 (5.81)

More generally, it can be shown that the spectral density of the ARMA process (5.75) is f ( ) = where is the standard deviation of the white noise process { t } the polynomials () and () are as dened in (5.77) The derivation of (5.82) uses the fact that convolutions become products under Fourier transformations The proof is elegant and can be found in many places see, for example, [Sargent1987], chapter 11, section 4 Its a nice exercise to verify that (5.80) and (5.81) are indeed special cases of (5.82) Interpreting the Spectral Density Plotting (5.81) reveals the shape of the spectral density for the AR(1) model when takes the values 0.8 and -0.8 respectively ( ei ) ( ei )
2

(5.82)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

363

These spectral densities correspond to the autocovariance functions for the AR(1) process shown above Informally, we think of the spectral density as being large at those [0, ] such that the autocovariance function exhibits signicant cycles at this frequency To see the idea, lets consider why, in the lower panel of the preceding gure, the spectral density for the case = 0.8 is large at = Recall that the spectral density can be expressed as f ( ) = (0) + 2 (k ) cos( k ) = (0) + 2 (0.8)k cos( k )
k 1 k 1

(5.83)

When we evaluate this at = , we get a large number because cos( k ) is large and positive when (0.8)k is positive, and large and negative when (0.8)k is negative Hence the product is always large and positive, and hence the sum of the products on the righthand side of (5.83) is large These ideas are illustrated in the next gure, which has k on the horizontal axis (click to enlarge)

On the other hand, if we evaluate f ( ) at = /3, then the cycles are not matched, the sequence (k ) cos( k ) contains both positive and negative terms, and hence the sum of these terms is much smaller In summary, the spectral density is large at frequencies where the autocovariance function exhibits cycles Inverting the Transformation We have just seen that the spectral density is useful in the sense that it provides a frequency-based perspective on the autocovariance structure of a covariance stationary process

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

364

Another reason that the spectral density is useful is that it can be inverted to recover the autocovariance function via the inverse Fourier transform In particular, for all k Z, we have (k) = 1 2

f ( ) ei k d

(5.84)

This is convenient in situations where the spectral density is easier to calculate and manipulate than the autocovariance function (For example, the expression (5.82) for the ARMA spectral density is much easier to work with than the expression for the ARMA autocovariance) Mathematical Theory This section is loosely based on [Sargent1987], p. 249-253, and included for those who would like a bit more insight into spectral densities and have at least some background in Hilbert space theory Others should feel free to skip to the next section none of this material is necessary to progress to computation Recall that every separable Hilbert space H has a countable orthonormal basis { hk } The nice thing about such a basis is that every f H satises f =

k hk
k

where

k := f , hk

(5.85)

Thus, f can be represented to any degree of precision by linearly combining basis vectors The scalar sequence = {k } is called the Fourier coefcients of f , and satises k |k |2 < T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

365

In other words, is in

2,

the set of square summable sequences


2

Consider an operator T that maps

into its expansion k k hk H

The Fourier coefcients of T are just = {k }, as you can verify by conrming that T , hk = k Using elementary results from Hilbert space theory, it can be shown that T is one-to-one if and are distinct in T is onto if f H then its preimage in
2 2,

then so are their expansions in H

is the sequence given by k = f , hk

T is a linear isometry in particular , = T , T Summarizing these results, we say that any separable Hilbert space is isometrically isomorphic to
2

In essence, this says that each separable Hilbert space we consider is just a different way of looking at the fundamental space 2 With this in mind, lets specialize to a setting where 2 is the autocovariance function of a covariance stationary process, and f is the spectral density H = L2 , where L2 is the set of square summable functions on the interval [ , ], with inner product g, h = g( )h( )d { hk } = the orthonormal basis for L2 given by the set of trigonometric functions ei k hn ( ) = , 2 k Z, [ , ]

Using the denition of T from above, we now have T =

( k ) 2
k

ei k

1 = f ( ) 2

(5.86)
2

In other words, apart from a scalar multiple, the spectral density is just an transformation of under a certain linear isometry a different way to view

In particular, it is an expansion of the autocovariance function with respect to the trigonometric basis functions in L2 As discussed above, the Fourier coefcients of T are given by the sequence , and, in particular, ( k ) = T , h k Transforming this inner product into its integral expression and using (5.86) gives (5.84), justifying our earlier expression for the inverse transform

Implementation
In this lecture, our main objective is to provide Python code to represent ARMA( p, q) processes visually via their 1. impulse response function T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

366

2. simulated time series 3. autocovariance function 4. spectral density In additional to individual plots of these entities, we want to provide functionality to generate 2x2 plots containing all this information In other words, we want to replicate the plots on pages 6869 of RMT3 Heres an example corresponding to the model Xt = 0.5Xt1 +
t

0.8

t 2

Before presenting the code, we make some brief comments on the implementation Comments on the Structure of the Program To achieve our stated aims we will implement a class called linearProcess with methods that generate the desired plots The call
lp = linearProcess(phi, theta, sigma)

will create an instance lp that represents the ARMA( p, q) model Xt = 1 Xt1 + ... + p Xt p +
t

+ 1

t 1

+ ... + q

tq

If phi and theta are arrays or sequences, then the interpretation will be phi holds the vector of parameters (1 , 2 , ..., p ) theta holds the vector of parameters (1 , 2 , ..., q ) The parameter sigma is always a scalar, the standard deviation of the white noise We will also permit phi and theta to be scalars, in which case the model will be interpreted as X t = X t 1 +
t

t 1

The two packages most useful for working with ARMA models are scipy.signal and numpy.fft T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

367

The package scipy.signal expects the parameters to be passed in to its functions in a manner consistent with the alternative ARMA notation (5.78) For example, the impulse response sequence {t } discussed above can be obtained using scipy.signal.dimpulse, and the function call should be of the form
times, psi = dimpulse((ma_poly, ar_poly, 1), n=impulse_length)

where ma_poly and ar_poly correspond to the polynomials in (5.77) that is, ma_poly is the vector (1, 1 , 2 , . . . , q ) ar_poly is the vector (1, 1 , 2 , . . . , p ) To this end, we will also maintain the arrays ma_poly and ar_poly as instance data, with their values computed automatically from the values of phi and theta supplied by the user If the user decides to change the value of either phi or theta ex-post by assignments such as
lp.phi = (0.5, 0.2) lp.theta = (0, -0.1)

then we would like ma_poly and ar_poly to update automatically to reect these new parameters This will be achieved in our implementation by using Descriptors Computing the Autocovariance Function As discussed above, for ARMA processes the spectral density has a simple representation that is relatively easy to calculate Given this fact, the easiest way to obtain the autocovariance function is to recover it from the spectral density via the inverse Fourier transform Here we will use NumPys Fourier transform packaget np.fft, which wraps a standard Fortranbased package called FFTPACK A look at the np.fft documentation shows that the inverse transform np.fft.ifft takes a given sequence A0 , A1 , . . . , An1 and returns the sequence a0 , a1 , . . . , an1 dened by ak = 1 n 1 At eik2 t/n n t =0

Thus, if we set At = f (t ), where f is the spectral density and t := 2 t/n, then ak = 1 n 1 1 2 f ( t ) e i t k = n t =0 2 n


n 1 t =0

f ( t ) e i t k ,

t : = 2 t / n

For n sufciently large, we then have ak 1 2


2 0

f ( ) ei k d =

1 2

f ( ) ei k d

(You can check the last equality) In view of (5.84) we have now shown that, for n sufciently large, ak (k ) which is exactly what we want to compute T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

368

Code Our implementation is as follows


import numpy as np from numpy import conj, pi, real import matplotlib.pyplot as plt from scipy.signal import dimpulse, freqz, dlsim class linearProcess(object): """ This class provides functions for working with scalar ARMA processes. particular, it defines methods for computing and plotting the autocovariance function, the spectral density, the impulse-response function and simulated time series. """ def __init__(self, phi, theta=0, sigma=1) : """ This class represents scalar ARMA(p, q) processes. The parameters phi and theta can be NumPy arrays, array-like sequences (lists, tuples) or scalars. If phi and theta are scalars, then the model is understood to be X_t = phi X_{t-1} + epsilon_t + theta epsilon_{t-1} where {epsilon_t} is a white noise process with standard deviation sigma. If phi and theta are arrays or sequences, then the interpretation is the ARMA(p, q) model X_t = phi_1 X_{t-1} + ... + phi_p X_{t-p} + epsilon_t + theta_1 epsilon_{t-1} + ... + theta_q epsilon_{t-q} where * phi = (phi_1, phi_2,..., phi_p) * theta = (theta_1, theta_2,..., theta_q) * sigma is a scalar, the standard deviation of the white noise """ self._phi, self._theta = phi, theta self.sigma = sigma self.set_params() def get_phi(self): return self._phi def get_theta(self): return self._theta def set_phi(self, new_value): self._phi = new_value

In

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

369

self.set_params() def set_theta(self, new_value): self._theta = new_value self.set_params() phi = property(get_phi, set_phi) theta = property(get_theta, set_theta) def set_params(self): """ Internally, scipy.signal works with systems of the form ar_poly(L) X_t = ma_poly(L) epsilon_t where L is the lag operator. To match this, we set ar_poly = (1, -phi_1, -phi_2,..., -phi_p) ma_poly = (1, theta_1, theta_2,..., theta_q) In addition, ar_poly must be at least as long as ma_poly. This can be achieved by padding it out with zeros when required. """ # === set up ma_poly === # ma_poly = np.asarray(self._theta) self.ma_poly = np.insert(ma_poly, 0, 1) # The array (1, theta) # === set up ar_poly === # if np.isscalar(self._phi): ar_poly = np.array(-self._phi) else: ar_poly = -np.asarray(self._phi) self.ar_poly = np.insert(ar_poly, 0, 1)

# The array (1, -phi)

# === pad ar_poly with zeros if required === # if len(self.ar_poly) < len(self.ma_poly): temp = np.zeros(len(self.ma_poly) - len(self.ar_poly)) self.ar_poly = np.hstack((self.ar_poly, temp)) def impulse_response(self, impulse_length=30): """ Get the impulse response corresponding to our model. Returns psi, where psi[j] is the response at lag j. Note: psi[0] is unity. """ sys = self.ma_poly, self.ar_poly, 1 times, psi = dimpulse(sys, n=impulse_length) psi = psi[0].flatten() # Simplify return value into flat array return psi def spectral_density(self, two_pi=True, resolution=1200) : """ Compute the spectral density function over [0, pi] if two_pi is False and [0, 2 pi] otherwise. The spectral density is the discrete time

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.7. LINEAR STOCHASTIC MODELS

370

Fourier transform of the autocovariance function. In particular, f(w) = sum_k gamma(k) exp(-ikw) where gamma is the autocovariance function and the sum is over the set of all integers. """ w, h = freqz(self.ma_poly, self.ar_poly, worN=resolution, whole=two_pi) spect = h * conj(h) * self.sigma**2 return w, spect def autocovariance(self, num_autocov=16) : """ Compute the autocovariance function over the integers range(num_autocov) using the spectral density and the inverse Fourier transform. """ spect = self.spectral_density()[1] acov = np.fft.ifft(spect).real return acov[:num_autocov] # num_autocov should be <= len(acov) / 2 def simulation(self, ts_length=90) : " Compute a simulated sample path. " sys = self.ma_poly, self.ar_poly, 1 u = np.random.randn(ts_length, 1) vals = dlsim(sys, u)[1] return vals.flatten() def plot_impulse_response(self, ax=None, show=True): if show: fig, ax = plt.subplots() ax.set_title('Impulse response') yi = self.impulse_response() ax.stem(range(len(yi)), yi) ax.set_xlim(xmin=(-0.5)) ax.set_ylim(min(yi)-0.1,max(yi)+0.1) ax.set_xlabel('time') ax.set_ylabel('response') if show: plt.show() def plot_spectral_density(self, ax=None, show=True): if show: fig, ax = plt.subplots() ax.set_title('Spectral density') w, spect = self.spectral_density(two_pi=False) ax.semilogy(w, spect) ax.set_xlim(0, pi) ax.set_ylim(0, np.max(spect)) ax.set_xlabel('frequency') ax.set_ylabel('spectrum') if show: plt.show()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.8. ESTIMATION OF SPECTRA

371

def plot_autocovariance(self, ax=None, show=True): if show: fig, ax = plt.subplots() ax.set_title('Autocovariance') acov = self.autocovariance() ax.stem(range(len(acov)), acov) ax.set_xlim(-0.5, len(acov) - 0.5) ax.set_xlabel('time') ax.set_ylabel('autocovariance') if show: plt.show() def plot_simulation(self, ax=None, show=True): if show: fig, ax = plt.subplots() ax.set_title('Sample path') x_out = self.simulation() ax.plot(x_out) ax.set_xlabel('time') ax.set_ylabel('state space') if show: plt.show() def quad_plot(self) : """ Plots the impulse response, spectral_density, autocovariance, and one realization of the process. """ num_rows, num_cols = 2, 2 fig, axes = plt.subplots(num_rows, num_cols, figsize=(12, 8)) plt.subplots_adjust(hspace=0.4) self.plot_impulse_response(axes[0, 0], show=False) self.plot_spectral_density(axes[0, 1], show=False) self.plot_autocovariance(axes[1, 0], show=False) self.plot_simulation(axes[1, 1], show=False) plt.show()

As an example of usage, try


phi = 0.5 theta = 0, -0.8 lp = linearProcess(phi, theta) lp.quad_plot()

5.8 Estimation of Spectra


Overview
In a previous lecture we covered some fundamental properties of covariance stationary linear stochastic processes T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.8. ESTIMATION OF SPECTRA

372

One objective for that lecture was to introduce spectral densities a standard and very useful technique for analyzing such processes In this lecture we turn to the problem of estimating spectral densities and other related quantities from data Estimates of the spectral density are computed using what is known as a periodogram which in turn is computed via the famous fast Fourier transform Once the basic technique has been explained, we will apply it to the analysis of several key macroeconomic time series For supplementary reading, see [Sargent1987] or [CryerChan2008].

Periodograms
Recall that the spectral density f of a covariance stationary process with autocorrelation function can be written as f ( ) = (0) + 2 (k ) cos( k ), R
k 1

Now consider the problem of estimating the spectral density of a given time series, when is unknown In particular, let X0 , . . . , Xn1 be n consecutive observations of a single time series that is assumed to be covariance stationary The most common estimator of the spectral density of this process is the periodogram of X0 , . . . , Xn1 , which is dened as 1 I ( ) := n
n 1 t =0 2 it

Xt e

(5.87)

(Recall that |z| denotes the modulus of complex number z) Alternatively, I ( ) can be expressed as 1 n 1 I ( ) = Xt cos( t) n t =0

n 1

t =0

Xt sin( t)

It is straightforward to show that the function I is even and 2 -periodic (i.e., I ( ) = I ( ) and I ( + 2 ) = I ( ) for all R) From these two results, you will be able to verify that the values of I on [0, ] determine the values of I on all of R The next section helps to explain the connection between the periodogram and the spectral density Interpretation To interpret the periodogram, it is convenient to focus on its values at the Fourier frequencies 2 j j := , j = 0, . . . , n 1 n T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.8. ESTIMATION OF SPECTRA

373

In what sense is I ( j ) an estimate of f ( j )? The answer is straightforward, although it does involve some algebra With a bit of effort one can show that, for any integer j > 0,
n 1 t =0

eit

n 1
j

t =0

exp

i 2 j

t n

=0

denote the sample mean n1 n1 Xt , we then have Letting X t =0 )eit j nI ( j ) = ( Xt X


t =0 n 1 2 n 1

t =0

)eit ( Xr X )eir ( Xt X
j

n 1 r =0

By carefully working through the sums, one can transform this to


n 1

nI ( j ) = Now let

t =0

)( Xtk X ) cos( j k ) )2 + 2 ( Xt X ( Xt X
k =1 t = k

n 1 n 1

(k) :=

1 n 1 )( Xtk X ), ( Xt X n t =k

k = 0, 1, . . . , n 1

This is the sample autocovariance function, the natural plug-in estimator of the autocovariance function (Plug-in estimator is an informal term for an estimator found by replacing expectations with sample means) With this notation, we can now write
n 1

(0) + 2 I ( j ) =

k =1

(k ) cos( j k )

Recalling our expression for f given above, we see that I ( j ) is just a sample analog of f ( j ) Calculation Lets now consider how to compute the periodogram as dened in (5.87) There are already functions available that will do this for us an example is statsmodels.tsa.stattools.periodogram in the statsmodels package However, it is very simple to replicate their results, and this will give us a platform to make useful extensions The most common way to calculate the periodogram is via the discrete Fourier transform, which in turn is implemented through the fast Fourier transform algorithm In general, given a sequence a0 , . . . , an1 , the discrete Fourier transform computes the sequence
n 1

A j :=

t =0

at exp

i 2

tj n

j = 0, . . . , n 1

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.8. ESTIMATION OF SPECTRA

374

With numpy.fft.fft imported as fft and a0 , . . . , an1 stored in NumPy array a, the function call fft(a) returns the values A0 , . . . , An1 as a NumPy array It follows that, when the data X0 , . . . , Xn1 is stored in array X, the values I ( j ) at the Fourier frequencies, which are given by 1 n
n 1

tj Xt exp i2 n t =0

j = 0, . . . , n 1

can be computed by np.abs(fft(X))**2 / len(X) Note: The numpy function abs acts elementwise, and correctly handles complex numbers (by computing their modulus, which is exactly what we need) Heres a function that puts all this together
import numpy as np from numpy.fft import fft def periodogram(x): "Argument x is a NumPy array containing the time series data" n = len(x) I_w = np.abs(fft(x))**2 / n w = 2 * np.pi * np.arange(n) / n # Fourier frequencies w, I_w = w[:int(n/2)], I_w[:int(n/2)] # Truncate to interval [0, pi] return w, I_w

Lets generate some data for this function using the linproc.py module from the main repository and see how it performs (The module is described in the lecture on linear processes) Heres a code snippet that, once the preceding code has been run, generates data from the process Xt = 0.5Xt1 +
t

0.8

t 2

(5.88)

where { t } is white noise with unit variance, and compares the periodogram to the actual spectral density
import matplotlib.pyplot as plt from linproc import linearProcess n = 40 phi, theta = 0.5, (0, -0.8) lp = linearProcess(phi, theta) X = lp.simulation(ts_length=n) # Data size # AR and MA parameters

fig, ax = plt.subplots() x, y = periodogram(X) ax.plot(x, y, 'b-', lw=2, alpha=0.5, label='periodogram') x_sd, y_sd = lp.spectral_density(two_pi=False, resolution=120) ax.plot(x_sd, y_sd, 'r-', lw=2, alpha=0.8, label='spectral density') ax.legend() plt.show()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.8. ESTIMATION OF SPECTRA

375

Running this should produce a gure similar to this one This estimate looks rather disappointing, but the data size is only 40, so perhaps its not surprising that the estimate is poor However, if we try again with n = 1200 the outcome is not much better

The periodogram is far too irregular relative to the underlying spectral density This brings us to our next topic

Smoothing
There are two related issues here T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.8. ESTIMATION OF SPECTRA

376

One is that, given the way the fast Fourier transform is implemented, the number of points at which I ( ) is estimated increases in line with the amount of data In other words, although we have more data, we are also using it to estimate more values A second issue is that densities of all types are fundamentally hard to estimate without parameteric assumptions Typically, nonparametric estimation of densities requires some degree of smoothing The standard way that smoothing is applied to periodograms is by taking local averages In other words, the value I ( j ) is replaced with a weighted average of the adjacent values I ( j p ), I ( j p +1 ), . . . , I ( j ), . . . , I ( j + p ) This weighted average can be written as IS ( j ) :=

= p

w ( ) I ( j+ )

(5.89)

where the weights w( p), . . . , w( p) are a sequence of 2 p + 1 nonnegative values summing to one In generally, larger values of p indicate more smoothing more on this below The next gure shows the kind of sequence typically used Note the smaller weights towards the edges and larger weights in the center, so that more distant values from I ( j ) have less weight than closer ones in the sum (5.89)

Estimation with Smoothing Our next step is to provide code that will not only estimate the periodogram but also provide smoothing as required This is accomplished in the module estspec, available in the main repository The rst two functions in the le estspec.py are printed below

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.8. ESTIMATION OF SPECTRA

377

""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: estspec.py Authors: John Stachurski and Thomas Sargent LastModified: 11/08/2013 """ from __future__ import division, print_function # Omit for Python 3.x import numpy as np from numpy.fft import fft from pandas import ols, Series def smooth(x, window_len=7, window='hanning'): """ Smooth the data in x using convolution with a window of requested size and type. Parameters: * x is a flat NumPy array --- the data to smooth * window_len is an odd integer --- the length of the window * window is a string giving the window type ('flat', 'hanning', 'hamming', 'bartlett' or 'blackman') Application of the smoothing window at the top and bottom of x is done by reflecting x around these points to extend it sufficiently in each direction. """ if len(x) < window_len: raise ValueError, "Input vector length must be at least window length." if window_len < 3: raise ValueError, "Window length must be at least 3." if not window_len % 2: # window_len is even window_len +=1 print("Window length reset to {}".format(window_len)) windows = {'hanning': np.hanning, 'hamming': np.hamming, 'bartlett': np.bartlett, 'blackman': np.blackman} # === reflect x around x[0] and x[-1] prior to convolution === # k = int(window_len / 2) xb = x[:k] # First k elements xt = x[-k:] # Last k elements s = np.concatenate((xb[::-1], x, xt[::-1])) # === select window values === #

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.8. ESTIMATION OF SPECTRA

378

if window == 'flat': w = np.ones(window_len) # moving average else: try: w = windows[window](window_len) except KeyError: print("Unrecognized window type. Defaulting to 'hanning'.") w = windows['hanning'](window_len) return np.convolve(w / w.sum(), s, mode='valid') def periodogram(x, window=None, window_len=7): """ Computes the periodogram I(w) = (1 / n) | sum_{t=0}^{n-1} x_t e^{itw} |^2 at the Fourier frequences w_j := 2 pi j / n, j = 0, ..., n - 1, using the fast Fourier transform. Only the frequences w_j in [0, pi] and corresponding values I(w_j) are returned. If a window type is given then smoothing is performed. * x is a flat NumPy array --- the time series data * window is a string giving the window type ('flat', 'hanning', 'hamming', 'bartlett' or 'blackman') * window_len is an odd integer --- the length of the window """ n = len(x) I_w = np.abs(fft(x))**2 / n w = 2 * np.pi * np.arange(n) / n w, I_w = w[:int(n/2)+1], I_w[:int(n/2)+1] # Take only values on [0, pi] if window: I_w = smooth(I_w, window_len=window_len, window=window) return w, I_w

The listing displays two functions, smooth() and periodogram() The periodogram() function returns a periodogram, optionally smoothed via the smooth() function Regarding the smooth() function, since smoothing adds a nontrivial amount of computation, we have applied a fairly terse array-centric method based around np.convolve Readers are left to either explore or simply use this code according to their interests The next three gures each show smoothed and unsmoothed periodograms, as well as the true spectral density (The model is the same as before see equation (5.88) and there are 400 observations) From top gure to bottom, the window length is varied from small to large T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.8. ESTIMATION OF SPECTRA

379

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.8. ESTIMATION OF SPECTRA

380

In looking at the gure, we can see that for this model and data size, the window length chosen in the middle gure provides the best t Relative to this value, the rst window length provides insufcient smoothing, while the third gives too much smoothing Of course in real estimation problems the true spectral density is not visible and the choice of appropriate smoothing will have to be made based on judgement/priors or some other theory Pre-Filtering and Smoothing In the code listing above we showed the rst two functions in the le estspec.py There is also third function in the le called ar_periodogram(), which adds a pre-processing step to periodogram smoothing First we describe the basic idea, and after that we give the code The essential idea is to 1. Transform the data in order to make estimation of the spectral density more efcient 2. Compute the periodogram associated with the transformed data 3. Reverse the effect of the transformation on the periodogram, so that it now estimates the spectral density of the original process Step 1 is called pre-ltering or pre-whitening, while step 3 is called recoloring The rst step is called pre-whitening because the transformation is usually designed to turn the data into something closer to white noise Why would this be desirable in terms of spectral density estimation? The reason is that we are smoothing our estimated periodogram based on estimated values at nearby points recall (5.89) The underlying assumption that makes this a good idea is that the true spectral density is relatively regular the value of I ( ) is close to that of I ( ) when is close to This will not be true in all cases, but it is certainly true for white noise For white noise, I is as regular as possible it is a constant function In this case, values of I ( ) at points near to provided the maximum possible amount of information about the value I ( ) Another way to put this is that if I is relatively constant, then we can use a large amount of smoothing without introducing too much bias The AR(1) Setting Lets examine this idea more carefully in a particular setting where the data is assumed to be AR(1) (More general ARMA settings can be handled using similar techniques to those described below)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.8. ESTIMATION OF SPECTRA

381

Suppose in partcular that { Xt } is covariance stationary and AR(1), with X t +1 = + X t +


t +1

(5.90)

where and (1, 1) are unknown parameters and { t } is white noise It follows that if we regress Xt+1 on Xt and an intercept, the residuals will approximate white noise Let g be the spectral density of { t } a constant function, as discussed above I0 be the periodogram estimated from the residuals an estimate of g f be the spectral density of { Xt } the object we are trying to estimate In view of an earlier result we obtained while discussing ARMA processes, f and g are related by f ( ) = 1 1 ei
2

g( )

(5.91)

This suggests that the recoloring step, which constructs an estimate I of f from I0 , should set 1 I ( ) = ei 1 is the OLS estimate of where The code for ar_periodogram() the third function in the module estspec does exactly this
def ar_periodogram(x, window='hanning', window_len=7): """ Compute periodogram from data x, using prewhitening, smoothing and recoloring. The data is fitted to an AR(1) model for prewhitening, and the residuals are used to compute a first-pass periodogram with smoothing. The fitted coefficients are then used for recoloring. Parameters: * x is a NumPy array containing time series data * window is a string indicating window type * window_len is an odd integer See the periodogram function documentation for more details on the window arguments. """ # === run regression === # x_current, x_lagged = x[1:], x[:-1] # x_t and x_{t-1} x_current, x_lagged = Series(x_current), Series(x_lagged) # pandas series results = ols(y=x_current, x=x_lagged, intercept=True, nw_lags=1) e_hat = results.resid.values phi = results.beta['x'] # === compute periodogram on residuals === #
2

I0 ( )

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.8. ESTIMATION OF SPECTRA

382

w, I_w = periodogram(e_hat, window=window, window_len=window_len) # === recolor and return === # I_w = I_w / np.abs(1 - phi * np.exp(1j * w))**2 return w, I_w

The next gure shows realizations of the two kinds of smoothed periodograms 1. standard smoothed periodogram, the ordinary smoothed periodogram, and 2. AR smoothed periodogram, the pre-whitened and recolored one generated by ar_periodogram() The periodograms are calculated from time series drawn from (5.90) with = 0 and = 0.9 Each time series is of length 150 The difference between the three subgures is just randomness each one uses a different draw of the time series

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

383

In all cases, periodograms are t with the hamming window and window length of 65 Overall, the t of the AR smoothed periodogram is much better, in the sense of being closer to the true spectral density

Exercises
Exercise 1 Replicate this gure (modulo randomness) The model is as in equation (5.88) and there are 400 observations For the smoothed periodogram, the windown type is hamming Solution: View solution Exercise 2 Replicate this gure (modulo randomness) The model is as in equation (5.90), with = 0, = 0.9 and 150 observations in each time series All periodograms are t with the hamming window and window length of 65 Solution: View solution Exercise 3 To be written. The exercise will be to use the code from this lecture to download FRED data and generate periodograms for different kinds of macroeconomic data.

5.9 Optimal Taxation


Overview
In this lecture we study optimal scal policy in a linear quadratic setting We slightly modify a well-known model model of Robert Lucas and Nancy Stokey [LucasStokey1983] so that convenient formulas for solving linear-quadratic models can be applied to simplify the calculations The economy consists of a representative household and a benevolent government The government nances an exogenous stream of government purchases with state-contingent loans and a linear tax on labor income A linear tax is sometimes called a at-rate tax The household maximizes utility by choosing paths for consumption and labor, taking prices and the governments tax rate and borrowing plans as given Maximum attainable utility for the household depends on the governments tax and borrowing plans The Ramsey problem [Ramsey1927] is to choose tax and borrowing plans that maximize the households welfare, taking the households optimizing behavior as given T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.9. OPTIMAL TAXATION

384

There is a large number of competitive equilibria indexed by different government scal policies The Ramsey planner chooses the best competitive equilibrium We want to study the dynamics of tax rates, tax revenues, government debt under a Ramsey plan Because the Lucas and Stokey model features state-contingent government debt, the government debt dynamics differ substantially from those in a model of Robert Barro [Barro1979] The treatment given here closely follows this manuscript, prepared by Thomas J. Sargent and Francois R. Velde We cover only the key features of the problem in this lecture, leaving you to refer to that source for additional results and intuition Model Features Linear quadratic (LQ) model Representative household Stochastic dynamic programming over an innite horizon Distortionary taxation

The Ramsey Problem


We begin by outlining the key assumptions regarding technology, households and the government sector Technology Labor can be converted one-for-one into a single, non-storable consumption good In the usual spirit of the LQ model, the amount of labor supplied in each period is unrestricted This is unrealistic, but helpful when it comes to solving the model Realistic labor supply can be induced by suitable parameter values Households Consider a representative household who chooses a path { t , ct } for labor and consumption to maximize 1 E t ( c t bt ) 2 + 2 (5.92) t 2 t =0 subject to the budget constraint E t p0 t [ dt + (1 t ) t + st ct ] = 0
t =0

(5.93)

Here is a discount factor in (0, 1) p0 t is state price at time t T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.9. OPTIMAL TAXATION

385

bt is a stochastic preference parameter dt is an endowment process t is a at tax rate on labor income st is a promised time-t coupon payment on debt issued by the government The budget constraint requires that the present value of consumption be restricted to the present value of endowments, labor income and coupon payments on bond holdings Government The government imposes a linear tax on labor income, fully committing to a stochastic path of tax rates at time zero The government also issues state-contingent debt Given government tax and borrowing plans, we can construct a competitive equilibrium with distorting government taxes Among all such competitive equilibria, the Ramsey plan is the one that maximizes the welfare of the representative consumer Exogenous Variables Endowments, government expenditure, the preference parameter bt and promised coupon payments on initial government debt st are all exogenous, and given by d t = Sd x t gt = S g x t bt = S b x t s t = Ss x t The matrices Sd , Sg , Sb , Ss are primitives and { xt } is an exogenous stochastic process taking values in Rk We consider two specications for { xt } 1. Discrete case: { xt } is a discrete state Markov chain with transition matrix P 2. VAR case: { xt } obeys xt+1 = Axt + Cwt+1 where {wt } is independent zero mean Gaussian with identify covariance matrix Feasibility The period-by-period feasibility restriction for this economy is c t + gt = d t +
t

(5.94)

A labor-consumption process { t , ct } is called feasible if (5.94) holds for all t Government budget constraint Where p0 t is a scaled Arrow-Debreu price, the time zero government budget constraint is E t p0 t ( st + gt t t ) = 0
t =0

(5.95)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

386

Equilibrium An equilibrium is a feasible allocation { t , ct }, a sequence of prices { pt } and a tax system {t } such that 1. The allocation { t , ct } is optimal for the household given { pt } and {t } 2. The governments budget constraint (5.95) is satised The Ramsey problem is to choose the equilibrium { t , ct , t , pt } that maximizes the households welface If { t , ct , t , pt } is a solution to the Ramsey problem, then {t } is called the Ramsey plan The solution procedure we adopt is 1. Use the rst order conditions from the household problem to pin down prices and allocations given {t } 2. Use these expressions to rewrite the government budget constraint (5.95) in terms of exogenous variables and allocations 3. Maximize the households objective function (5.92) subject to the constraint from the last step and the feasibility constraint (5.94) The solution to this maximization problem pins down all quantities of interest Solution Step one is to obtain the rst order conditions for the households problem, taking taxes and prices as given Letting be the Lagrange multiplier on (5.93), the rst order conditions are pt = (ct bt )/ and t = ( ct bt )(1 t ) Rearranging and normalizing at = b0 c0 , we can write these conditions as pt = bt c t b0 c0 and t = 1
t

bt c t

(5.96)

Substituting (5.96) into the governments budget constraint (5.95) yields E t (bt ct )(st + gt t ) +
t =0 2 t

=0

(5.97)

The Ramsey problem now amounts to maximizing (5.92) subject to (5.97) and (5.94) The associated Lagrangian is L = E t
t =0

1 ( c t bt ) 2 + 2

2 t

+ (bt ct )( t st gt )
t

2 t

+ t [dt +

c t gt ]
(5.98)

The rst order conditions associated with ct and

are

(ct bt ) + [ t + ( gt + st )] = t
and
t

[(bt ct ) 2 t ] = t
February 5, 2014

T HOMAS S ARGENT AND J OHN S TACHURSKI

5.9. OPTIMAL TAXATION

387

Combining these last two equalities with (5.94) and working through the algebra, one can show that t mt (5.99) t = t mt and ct = c where : = / (1 + 2 ) t := (bt dt + gt )/2 t := (bt + dt gt )/2 c mt := (bt dt st )/2 Apart from , all of these quantities are expressed in terms of exogenous variables To solve for , we can use the governments budget constraint again The term inside the brackets in (5.97) is (bt ct )(st + gt ) (bt ct ) t +
2 t

, this term can be rewritten as Using (5.99), the denitions above and the fact that = b c
2 t )( gt + st ) + 2m2 ( bt c t ( )

Reinserting into (5.97), we get E t )( gt + st ) t ( bt c

+ ( 2 )E

t =0

t =0

t 2m2 t

=0

(5.100)

Although it might not be clear yet, we are nearly there: The two expectations terms in (5.100) can be solved for in terms of model primitives This in turn allows us to solve for the Lagrange multiplier With in hand, we can go back and solve for the allocations via (5.99) Once we have the allocations, prices and the tax system can be derived from (5.96) Solving the Quadratic Term Lets consider how to obtain the term in (5.100) If we can solve the two expected sums b0 := E t )( gt + st ) t ( bt c

and

a0 : = E

t =0

t =0

t 2m2 t

(5.101)

then the problem reduces to solving b0 + a0 (2 ) = 0 for Provided that 4b0 < a0 , there is a unique solution (0, 1/2), and a unique corresponding > 0 Lets work out how to solve the expectations terms in (5.101)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

388

t )( gt + st ) inside the summation can be expressed as For the rst one, the random variable (bt c 1 x ( S Sd + S g ) ( S g + Ss ) x t 2 t b For the second expectation in (5.101), the random variable 2m2 t can be written as 1 x ( S Sd Ss ) ( Sb Sd Ss ) x t 2 t b It follows that both of these expectations terms are special cases of the expression q( x0 ) = E t xt Hxt
t =0

(5.102)

where H is a conformable matrix, and xt is the transpose of column vector xt Suppose rst that { xt } is the Gaussian VAR described above In this case, the formula for computing q( x0 ) is known to be q( x0 ) = x0 Qx0 + v, where Q is the solution to Q = H + A QA, and v = trace(C QC ) /(1 ) The rst equation is known as a discrete Lyapunov equation, and can be solved using this function Next suppose that { xt } is the discrete Markov process described above Suppose further that each xt takes values in the state space { x1 , . . . , x N } Rk Let h : Rk R be a given function, and suppose that we wish to evaluate q ( x0 ) = E t h ( x t )
t =0

given

x0 = x j

For example, in the discussion above, h( xt ) = xt Hxt It is legitimate to pass the expectation through the sum, leading to q ( x0 ) = Here Pt is the t-th power of the transition matrix P h is, with some abuse of notation, the vector (h( x1 ), . . . , h( x N )) ( Pt h)[ j] indicates the j-th element of Pt h It can be show that (5.103) is in fact equal to the j-th element of the vector ( I P)1 h This last fact is applied in the calculations below

t =0

t ( Pt h)[ j]

(5.103)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

389

Other Variables We are interested in tracking several other variables besides the ones described above One is the present value of government obligations outstanding at time t, which can be expressed as Bt := Et j pt t+ j ( t+ j
j =0 t+ j

gt + j )

(5.104)

Using our expression for prices and the Ramsey plan, we can also write Bt as Bt = Et
j =0 j

(bt+ j ct+ j )(

t+ j

gt + j )

2 t+ j

bt c t

This variation is more convenient for computation Yet another way to write Bt is Bt = where
1 R tj ( t+ j

t+ j

gt + j )

j =0

1 j t R tj : = Et pt+ j

Here Rtj can be thought of as the gross j-period risk-free rate on holding government debt between t and j Furthermore, letting Rt be the one-period risk-free rate, we dene t+1 := Bt+1 Rt [ Bt (t and t :=
t

gt )]

s =0

The term t+1 is the payout on the publics portfolio of government debt As shown in the original manuscript, if we distort one-step-ahead transition probabilities by the adjustment factor pt t +1 t := Et p t t +1 then t is a martingale under the distorted probabilities See the treatment in the manuscript for more discussion and intuition For now we will concern ourselves with computation

Implementation
The following code provides functions for 1. Solving for the Ramsey plan given a specication of the economy 2. Simulating the dynamics of the major variables T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.9. OPTIMAL TAXATION

390

The le is called lqramsey.py, and is provided in the main repository Description and clarications are given below
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: lqramsey.py Authors: Thomas Sargent, Doc-Jin Jang, Jeong-hun Choi, John Stachurski LastModified: 11/08/2013 This module provides code to compute Ramsey equilibria in a LQ economy with distortionary taxation. The program computes allocations (consumption, leisure), tax rates, revenues, the net present value of the debt and other related quantities. Functions for plotting the results are also provided below. See the lecture at http://quant-econ.net/lqramsey.html for a description of the model. """ import sys import numpy as np from numpy import sqrt, max, eye, dot, zeros, cumsum, array from numpy.random import randn import scipy.linalg import matplotlib.pyplot as plt from collections import namedtuple from rank_nullspace import nullspace import mc_tools from quadsums import var_quadratic_sum

# == Set up a namedtuple to store data on the model economy == # Economy = namedtuple('economy', ('beta', # Discount factor 'Sg', # Govt spending selector matrix 'Sd', # Exogenous endowment selector matrix 'Sb', # Utility parameter selector matrix 'Ss', # Coupon payments selector matrix 'discrete', # Discrete or continuous -- boolean 'proc')) # Stochastic process parameters # == Set up a namedtuple to store return values for compute_paths() == # Path = namedtuple('path', ('g', # Govt spending 'd', # Endowment 'b', # Utility shift parameter 's', # Coupon payment on existing debt 'c', # Consumption 'l', # Labor 'p', # Price

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

391
# # # # # # # Tax rate Revenue Govt debt Risk free gross return One-period risk-free interest rate Cumulative rate of return, adjusted Adjustment factor for Pi

'tau', 'rvn', 'B', 'R', 'pi', 'Pi', 'xi'))

def compute_paths(T, econ): """ Compute simulated time paths for exogenous and endogenous variables. Parameters =========== T: int Length of the simulation econ: a namedtuple of type 'Economy', containing beta - Discount factor Sg - Govt spending selector matrix Sd - Exogenous endowment selector matrix Sb - Utility parameter selector matrix Ss - Coupon payments selector matrix discrete - Discrete exogenous process (True or False) proc - Stochastic process parameters Returns ======== path: a namedtuple of type 'Path', containing g - Govt spending d - Endowment b - Utility shift parameter s - Coupon payment on existing debt c - Consumption l - Labor p - Price tau - Tax rate rvn - Revenue B - Govt debt R - Risk free gross return pi - One-period risk-free interest rate Pi - Cumulative rate of return, adjusted xi - Adjustment factor for Pi The corresponding values are flat numpy ndarrays. """ # == Simplify names == # beta, Sg, Sd, Sb, Ss = econ.beta, econ.Sg, econ.Sd, econ.Sb, econ.Ss if econ.discrete:

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

392

P, x_vals = econ.proc else: A, C = econ.proc # == Simulate the exogenous process x == # if econ.discrete: state = mc_tools.sample_path(P, init=0, sample_size=T) x = x_vals[:,state] else: # == Generate an initial condition x0 satisfying x0 = A x0 == # nx, nx = A.shape x0 = nullspace((eye(nx) - A)) x0 = -x0 if (x0[nx-1] < 0) else x0 x0 = x0 / x0[nx-1] # == Generate a time series x of length T starting from x0 == # nx, nw = C.shape x = zeros((nx, T)) w = randn(nw, T) x[:, 0] = x0.T for t in range(1,T): x[:, t] = dot(A, x[:, t-1]) + dot(C, w[:, t]) # == Compute exogenous variable sequences == # g, d, b, s = (dot(S, x).flatten() for S in (Sg, Sd, Sb, Ss)) # == Solve for Lagrange multiplier in the govt budget constraint == # ## In fact we solve for nu = lambda / (1 + 2*lambda). Here nu is the ## solution to a quadratic equation a(nu**2 - nu) + b = 0 where ## a and b are expected discounted sums of quadratic forms of the state. Sm = Sb - Sd - Ss # == Compute a and b == # if econ.discrete: ns = P.shape[0] F = scipy.linalg.inv(np.identity(ns) - beta * P) a0 = 0.5 * dot(F, dot(Sm, x_vals).T**2)[0] H = dot(Sb - Sd + Sg, x_vals) * dot(Sg - Ss, x_vals) b0 = 0.5 * dot(F, H.T)[0] a0, b0 = float(a0), float(b0) else: H = dot(Sm.T, Sm) a0 = 0.5 * var_quadratic_sum(A, C, H, beta, x0) H = dot((Sb - Sd + Sg).T, (Sg + Ss)) b0 = 0.5 * var_quadratic_sum(A, C, H, beta, x0) # == Test that nu has a real solution before assigning == # warning_msg = """ Hint: you probably set government spending too {}. Elect a {} Congress and start over. """ disc = a0**2 - 4 * a0 * b0 if disc >= 0: nu = 0.5 * (a0 - sqrt(disc)) / a0

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

393

else: print "There is no Ramsey equilibrium for these parameters." print warning_msg.format('high', 'Republican') sys.exit(0) # == Test that the Lagrange multiplier has the right sign == # if nu * (0.5 - nu) < 0: print "Negative multiplier on the government budget constraint." print warning_msg.format('low', 'Democratic') sys.exit(0) # == Solve for the allocation Sc = 0.5 * (Sb + Sd - Sg - nu Sl = 0.5 * (Sb - Sd + Sg - nu c = dot(Sc, x).flatten() l = dot(Sl, x).flatten() p = dot(Sb - Sc, x).flatten() tau = 1 - l / (b - c) rvn = l * tau given nu and x == # * Sm) * Sm) # Price without normalization

# == Compute remaining variables == # if econ.discrete: H = dot(Sb - Sc, x_vals) * dot(Sl - Sg, x_vals) - dot(Sl, x_vals)**2 temp = dot(F, H.T).flatten() B = temp[state] / p H = dot(P[state, :], dot(Sb - Sc, x_vals).T).flatten() R = p / (beta * H) temp = dot(P[state,:], dot(Sb - Sc, x_vals).T).flatten() xi = p[1:] / temp[:T-1] else: H = dot(Sl.T, Sl) - dot((Sb - Sc).T, Sl - Sg) L = np.empty(T) for t in range(T): L[t] = var_quadratic_sum(A, C, H, beta, x[:, t]) B = L / p Rinv = (beta * dot(dot(Sb - Sc, A), x)).flatten() / p R = 1 / Rinv AF1 = dot(Sb - Sc, x[:, 1:]) AF2 = dot(dot(Sb - Sc, A), x[:, :T-1]) xi = AF1 / AF2 xi = xi.flatten() pi = B[1:] - R[:T-1] * B[:T-1] - rvn[:T-1] + g[:T-1] Pi = cumsum(pi * xi) # == Prepare return values == # path = Path(g=g, d=d, b=b, s=s, c=c, l=l, p=p,

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

394

tau=tau, rvn=rvn, B=B, R=R, pi=pi, Pi=Pi, xi=xi) return path def gen_fig_1(path): """ The parameter is the path namedtuple returned by compute_paths(). the docstring of that function for details. """ T = len(path.c) # == Prepare axes == # num_rows, num_cols = 2, 2 fig, axes = plt.subplots(num_rows, num_cols, figsize=(14, 10)) plt.subplots_adjust(hspace=0.4) for i in range(num_rows): for j in range(num_cols): axes[i, j].grid() axes[i, j].set_xlabel(r'Time') bbox = (0., 1.02, 1., .102) legend_args = {'bbox_to_anchor' : bbox, 'loc' : 3, 'mode' : 'expand'} p_args = {'lw' : 2, 'alpha' : 0.7} # == Plot consumption, govt expenditure and revenue == # ax = axes[0, 0] ax.plot(path.rvn, label=r'$\tau_t \ell_t$', **p_args) ax.plot(path.g, label=r'$g_t$', **p_args) ax.plot(path.c, label=r'$c_t$', **p_args) ax.legend(ncol=3, **legend_args) # == Plot govt expenditure and debt == # ax = axes[0, 1] ax.plot(range(1,T+1), path.rvn, label=r'$\tau_t \ell_t$', **p_args) ax.plot(range(1,T+1), path.g, label=r'$g_t$', **p_args) ax.plot(range(1,T), path.B[1:T], label=r'$B_{t+1}$', **p_args) ax.legend(ncol=3, **legend_args) # == Plot risk free return == # ax = axes[1, 0] ax.plot(range(1,T+1), path.R - 1, label=r'$R_t - 1$', **p_args) ax.legend(ncol=1, **legend_args) # == Plot revenue, expenditure and risk free rate == # ax = axes[1, 1] ax.plot(range(1,T+1), path.rvn, label=r'$\tau_t \ell_t$', **p_args)

See

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

395

ax.plot(range(1,T+1), path.g, label=r'$g_t$', **p_args) axes[1, 1].plot(range(1,T), path.pi, label=r'$\pi_{t+1}$', **p_args) ax.legend(ncol=3, **legend_args) plt.show() def gen_fig_2(path): """ The parameter is the path namedtuple returned by compute_paths(). the docstring of that function for details. """ T = len(path.c) # == Prepare axes == # num_rows, num_cols = 2, 1 fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 10)) plt.subplots_adjust(hspace=0.5) bbox = (0., 1.02, 1., .102) bbox = (0., 1.02, 1., .102) legend_args = {'bbox_to_anchor' : bbox, 'loc' : 3, 'mode' : 'expand'} p_args = {'lw' : 2, 'alpha' : 0.7} # == Plot adjustment factor == # ax = axes[0] ax.plot(range(2,T+1), path.xi, label=r'$\xi_t$', **p_args) ax.grid() ax.set_xlabel(r'Time') ax.legend(ncol=1, **legend_args) # == Plot adjusted cumulative return == # ax = axes[1] ax.plot(range(2,T+1), path.Pi, label=r'$\Pi_t$', **p_args) ax.grid() ax.set_xlabel(r'Time') ax.legend(ncol=1, **legend_args) plt.show()

See

Comments on the Code The function var_quadratic_sum imported from quadsums is for computing the value of (5.102) when the exogenous process { xt } is of the VAR type described above Below the denition of the function, you will see denitions of two namedtuple objects, Economy and Path The rst is used to collect all the parameters and primitives of a given LQ economy, while the second collects output of the computations In Python, a namedtuple is a popular data type from the collections module of the standard library that replicates the functionality of a tuple, but also allows you to assign a name to each tuple element T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

5.9. OPTIMAL TAXATION

396

These elements can then be references via dotted attribute notation see for example the use of path in the function gen_fig_1() The benets of using namedtuples: Keeps content organized by meaning Helps reduce the number of global variables Other than that, our code is long but relatively straightforward

Examples
Lets look at two examples of usage The Continuous Case Our rst example adopts the VAR specication described above Regarding the primitives, we set = 1/1.05 bt = 2.135 and st = dt = 0 for all t Government spending evolves according to g t +1 g = ( g t g ) + C g w g,t +1 with = 0.7, g = 0.35 and Cg = g 1 2 /10

""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: lqramsey_ar1.py Authors: Thomas Sargent, Doc-Jin Jang, Jeong-hun Choi, John Stachurski LastModified: 11/08/2013 Example 1: Govt spending is AR(1) and state is (g, 1). """ import numpy as np from numpy import array from lqramsey import * # == Parameters == # beta = 1 / 1.05 rho, mg = .7, .35 A = np.identity(2) A[0,:] = rho, mg * (1-rho) C = np.zeros((2, 1)) C[0, 0] = np.sqrt(1 - rho**2) * mg / 10 Sg = array((1, 0)).reshape(1, 2) Sd = array((0, 0)).reshape(1, 2) Sb = array((0, 2.135)).reshape(1, 2) Ss = array((0, 0)).reshape(1, 2)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

397

economy = Economy(beta=beta, Sg=Sg, Sd=Sd, Sb=Sb, Ss=Ss, discrete=False, proc=(A, C)) T = 50 path = compute_paths(T, economy) gen_fig_1(path)

Running the program produces the gure

The legends on the gures indicate the variables being tracked Most obvious from the gure is tax smoothing in the sense that tax revenue is much less variable than government expenditure After running the code above, if you then execute gen_fig_2(path) from your IPython shell you will produce the gure See the original manuscript for comments and interpretation The Discrete Case Our second example adopts a discrete Markov specication for the exogenous process
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: lqramsey_discrete.py Authors: Thomas Sargent, Doc-Jin Jang, Jeong-hun Choi, John Stachurski LastModified: 11/08/2013

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

398

LQ Ramsey model with discrete exogenous process. """ import numpy as np from numpy import array from lqramsey import * # == Parameters == # beta = 1 / 1.05 P = array([[0.8, 0.2, 0.0], [0.0, 0.5, 0.5], [0.0, 0.0, 1.0]]) # == Possible states of the world == # # Each column is a state of the world. x_vals = array([[0.5, 0.5, 0.25], [0.0, 0.0, 0.0], [2.2, 2.2, 2.2], [0.0, 0.0, 0.0], [1.0, 1.0, 1.0]]) Sg = array((1, 0, 0, 0, 0)).reshape(1, Sd = array((0, 1, 0, 0, 0)).reshape(1, Sb = array((0, 0, 1, 0, 0)).reshape(1, Ss = array((0, 0, 0, 1, 0)).reshape(1, economy = Economy(beta=beta,

The rows are [g d b s 1]

5) 5) 5) 5)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

399

Sg=Sg, Sd=Sd, Sb=Sb, Ss=Ss, discrete=True, proc=(P, x_vals)) T = 15 path = compute_paths(T, economy) gen_fig_1(path)

The call gen_fig_1(path) generates the gure

while gen_fig_2(path) generates See the original manuscript for comments and interpretation

Exercises
Exercise 1 Modify the VAR example given above, setting g t +1 g = ( g t 3 g ) + C g w g,t +1 with = 0.95 and Cg = 0.7 1 2

Produce the corresponding gures Solution: View solution

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

5.9. OPTIMAL TAXATION

400

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

CHAPTER

SIX

SOLUTIONS TO EXERCISES
This page collects solutions to all exercises in the course Note: all of these Python les can be downloaded from the main repository

6.1 Exercises from An Introductory Example


Solution to Exercise 1
def factorial(n): k = 1 for i in range(n): k = k * (i + 1) return k

Solution to Exercise 2
from random import uniform def binomial_rv(n, p): count = 0 for i in range(n): U = uniform(0, 1) if U < p: count = count + 1 print count

# Or count += 1

Solution to Exercise 3
Consider the circle of diameter 1 embedded in the unit square Let A be its area and let r = 1/2 be its radius If we know then we can compute A via A = r2 But here the point is to compute , which we can do by = A/r2

401

6.1. EXERCISES FROM AN INTRODUCTORY EXAMPLE

402

Summary: If we can estimate the area of the unit circle, then dividing by r2 = (1/2)2 = 1/4 gives an estimate of We estimate the area by sampling bivariate uniforms and looking at the fraction that fall into the unit circle
from __future__ import division from random import uniform from math import sqrt n = 100000 count = 0 for i in range(n): u, v = uniform(0, 1), uniform(0, 1) d = sqrt((u - 0.5)**2 + (v - 0.5)**2) if d < 0.5: count += 1 area_estimate = count / n print area_estimate * 4 # dividing by radius**2 # Omit if using Python 3.x

Solution to Exercise 4
from random import uniform payoff = 0 count = 0 for i in range(10): U = uniform(0, 1) count = count + 1 if U < 0.5 else 0 if count == 3: payoff = 1 print payoff

Solution to Exercise 5
from pylab import plot, show from random import normalvariate alpha = 0.9 ts_length = 200 current_x = 0 x_values = [] for i in range(ts_length): x_values.append(current_x) current_x = alpha * current_x + normalvariate(0, 1)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.2. EXERCISES FROM PYTHON ESSENTIALS

403

plot(x_values, 'b-') show()

Solution to Exercise 6
from pylab import plot, show, legend from random import normalvariate alphas = [0.0, 0.8, 0.98] ts_length = 200 for alpha in alphas: x_values = [] current_x = 0 for i in range(ts_length): x_values.append(current_x) current_x = alpha * current_x + normalvariate(0, 1) plot(x_values, label='alpha = ' + str(alpha)) legend() show()

6.2 Exercises from Python Essentials


Solution to Exercise 1
Part 1 solution
One solution is
>>> sum([x * y for x, y in zip(x_vals, y_vals)])

Incidentally, this also works


>>> sum(x * y for x, y in zip(x_vals, y_vals))

Part 2 solution
One solution is
>>> sum([x % 2 == 0 for x in range(100)])

or just
>>> sum(x % 2 == 0 for x in range(100))

Some (less natural) alternatives, which help to illustrate the exibility of list comprehensions, are

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.2. EXERCISES FROM PYTHON ESSENTIALS

404

>>> len([x for x in range(100) if x % 2 == 0])

and
>>> sum([1 for x in range(100) if x % 2 == 0])

Part 3 solution
One solution is
>>> pairs = ((2, 5), (4, 2), (9, 8), (12, 10)) >>> sum([x % 2 == 0 and y % 2 == 0 for x, y in pairs])

Solution to Exercise 2
Solution
def p(x, coeff): return sum(a * x**i for i, a in enumerate(coeff))

Solution to Exercise 3
Solution
def f(string): count = 0 for letter in string: if letter == letter.upper(): count += 1 return count

Alternatively,
def f(string): return sum(char1 == char2 for char1, char2 in zip(string, string.upper()))

Solution to Exercise 4
Solution
def f(seq_a, seq_b): is_subset = True for a in seq_a: if a not in seq_b: is_subset = False return is_subset

Of course, if we use the sets data type, then the solution is easier T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

6.2. EXERCISES FROM PYTHON ESSENTIALS

405

def f(seq_a, seq_b): return set(seq_a).issubset(set(seq_b))

Solution to Exercise 5
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: linapprox.py Authors: John Stachurski, Thomas J. Sargent LastModified: 11/08/2013 """ from __future__ import division # Omit if using Python 3.x

def linapprox(f, a, b, n, x): """ Evaluates the piecewise linear interpolant of f at x on the interval [a, b], with n evenly spaced grid points. Parameters =========== f : function The function to approximate x, a, b : scalars (floats or integers) Evaluation point and endpoints, with a <= x <= b n : integer Number of grid points Returns ========= A float. The interpolant evaluated at x """ length_of_interval = b - a num_subintervals = n - 1 step = length_of_interval / num_subintervals # === find first grid point larger than x === # point = a while point <= x: point += step # === x must lie between the gridpoints (point - step) and point === # u, v = point - step, point return f(u) + (x - u) * (f(v) - f(u)) / (v - u)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.3. EXERCISES FROM OBJECT ORIENTED PROGRAMMING

406

6.3 Exercises from Object Oriented Programming


Solution to Exercise 1
class ecdf: def __init__(self, observations): self.observations = observations def __call__(self, x): counter = 0.0 for obs in self.observations: if obs <= x: counter += 1 return counter / len(self.observations)

Solution to Exercise 2
class Polynomial: def __init__(self, coefficients): """ Creates an instance of the Polynomial class representing p(x) = a_0 x^0 + ... + a_N x^N, where a_i = coefficients[i]. """ self.coefficients = coefficients def __call__(self, x): "Evaluate the polynomial at x." y = 0 for i, a in enumerate(self.coefficients): y += a * x**i return y def differentiate(self): "Reset self.coefficients to those of p' instead of p." new_coefficients = [] for i, a in enumerate(self.coefficients): new_coefficients.append(i * a) # Remove the first element, which is zero del new_coefficients[0] # And reset coefficients data to new values self.coefficients = new_coefficients

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.4. EXERCISES FROM MORE LANGUAGE FEATURES

407

6.4 Exercises from More Language Features


Solution to Exercise 1
Heres the standard solution
def x(t): if t == 0: return 0 if t == 1: return 1 else: return x(t-1) + x(t-2)

Solution to Exercise 2
One solution is as follows
def column_iterator(target_file, column_number): """A generator function for CSV files. When called with a file name target_file (string) and column number column_number (integer), the generator function returns a generator which steps through the elements of column column_number in file target_file. """ f = open(target_file, 'r') for line in f: yield line.split(',')[column_number - 1] f.close() dates = column_iterator('test_table.csv', 1) for date in dates: print date

Solution to Exercise 3
f = open('numbers.txt') total = 0.0 for line in f: try: total += float(line) except ValueError: pass f.close() print total

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.5. EXERCISES FROM NUMPY

408

6.5 Exercises from NumPy


Solution to Exercise 1
import numpy as np def p(x, coef): X = np.empty(len(coef)) X[0] = 1 X[1:] = x y = np.cumprod(X) # y = [1, x, x**2,...] return np.dot(coef, y)

Solution to Exercise 2
Heres our rst pass at a solution:
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: discrete_rv0.py Authors: John Stachurski and Thomas Sargent LastModified: 11/08/2013 """ from numpy import cumsum from numpy.random import uniform class discreteRV: """ Generates an array of draws from a discrete random variable with vector of probabilities given by q. """ def __init__(self, q): """ The argument q is a NumPy array, or array like, nonnegative and sums to 1 """ self.q = q self.Q = cumsum(q) def draw(self, k=1): """ Returns k draws from q. For each such draw, the value i is returned with probability q[i]. """ return self.Q.searchsorted(uniform(0, 1, size=k))

The logic is not obvious, but if you take your time and read it slowly, you will understand There is a problem here, however T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

6.5. EXERCISES FROM NUMPY

409

Suppose that q is altered after an instance of discreteRV is created, for example by


In [61]: q = (0.1, 0.9) In [62]: d = discreteRV(q) In [63]: d.q = (0.5, 0.5)

The problem is that Q does not change accordingly, and Q is the data used in the draw method To deal with this, one option is to compute Q every time the draw method is called But this is inefcient relative to computing Q once off A better option is to use descriptorsyou can refresh your memory here Heres a solution using descriptors that behaves as we desire
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: discrete_rv.py Authors: John Stachurski and Thomas Sargent LastModified: 11/08/2013 """ from numpy import cumsum from numpy.random import uniform class discreteRV(object): """ Generates an array of draws from a discrete random variable with vector of probabilities given by q. """ def __init__(self, q): """ The argument q is a NumPy array, or array like, nonnegative and sums to 1 """ self._q = q self.Q = cumsum(q) def get_q(self): return self._q def set_q(self, val): self._q = val self.Q = cumsum(val) q = property(get_q, set_q) def draw(self, k=1): """ Returns k draws from q. For each such draw, the value i is returned

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.6. EXERCISES FROM SCIPY

410

with probability q[i]. """ return self.Q.searchsorted(uniform(0, 1, size=k))

Solution to Exercise 3
""" Origin: QEwP by John Stachurski and Thomas J. Sargent Filename: ecdf.py Authors: John Stachurski and Thomas Sargent LastModified: 11/08/2013 Implements the empirical distribution function. """ import numpy as np import matplotlib.pyplot as plt class ecdf: def __init__(self, observations): self.observations = np.asarray(observations) def __call__(self, x): return np.mean(self.observations <= x) def plot(self, a=None, b=None): # === choose reasonable interval if [a, b] not specified === # if not a: a = self.observations.min() - self.observations.std() if not b: b = self.observations.max() + self.observations.std() # === generate plot === # x_vals = np.linspace(a, b, num=100) f = np.vectorize(self.__call__) plt.plot(x_vals, f(x_vals)) plt.show()

6.6 Exercises from SciPy


Solution to Exercise 1
Heres a recursive implementation of the bisection algorithm, in le bisection2.py
""" Origin: QE by John Stachurski and Thomas J. Sargent

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.7. EXERCISES FROM PANDAS

411

Filename: bisection2.py Authors: John Stachurski, Thomas J. Sargent LastModified: 11/08/2013 """ def bisect(f, a, b, tol=10e-5): """ Implements the bisection root finding algorithm, assuming that f is a real-valued function on [a, b] satisfying f(a) < 0 < f(b). """ lower, upper = a, b if upper - lower < tol: return 0.5 * (upper + lower) else: middle = 0.5 * (upper + lower) print('Current mid point = {}'.format(middle)) if f(middle) > 0: # Implies root is between lower and middle bisect(f, lower, middle) else: # Implies root is between middle and upper bisect(f, middle, upper)

We can test it as follows


In [23]: run bisection2.py In [24]: f = lambda x: np.sin(4 * (x - 0.25)) + x + x**20 - 1 In [25]: bisect(f, 0, 1) Current mid point = 0.5 Current mid point = 0.25 Current mid point = 0.375 Current mid point = 0.4375 Current mid point = 0.40625 Current mid point = 0.421875 Current mid point = 0.4140625 Current mid point = 0.41015625 Current mid point = 0.408203125 Current mid point = 0.4091796875 Current mid point = 0.40869140625 Current mid point = 0.408447265625 Current mid point = 0.408325195312 Current mid point = 0.408264160156

6.7 Exercises from Pandas


Solution to Exercise 1
import numpy as np import pandas as pd

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.8. EXERCISES FROM LLN AND CLT

412

import datetime as dt import pandas.io.data as web import matplotlib.pyplot as plt ticker_list = {'INTC': 'Intel', 'MSFT': 'Microsoft', 'IBM': 'IBM', 'BHP': 'BHP', 'RSH': 'RadioShack', 'TM': 'Toyota', 'AAPL': 'Apple', 'AMZN': 'Amazon', 'BA': 'Boeing', 'QCOM': 'Qualcomm', 'KO': 'Coca-Cola', 'GOOG': 'Google', 'SNE': 'Sony', 'PTR': 'PetroChina'} start = dt.datetime(2013, 1, 1) end = dt.datetime.today() price_change = {} for ticker in ticker_list: prices = web.DataReader(ticker, 'yahoo', start, end) closing_prices = prices['Close'] change = 100 * (closing_prices[-1] - closing_prices[0]) / closing_prices[0] name = ticker_list[ticker] price_change[name] = change pc = pd.Series(price_change) pc.sort() fig, ax = plt.subplots() pc.plot(kind='bar', ax=ax) plt.show()

6.8 Exercises from LLN and CLT


Solution to Exercise 1
Here is one solution You might have to modify or delete the lines starting with rc, depending on your conguration
""" Illustrates the delta method, a consequence of the central limit theorem. """ import numpy as np from scipy.stats import uniform, norm import matplotlib.pyplot as plt

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.8. EXERCISES FROM LLN AND CLT

413

from matplotlib import rc # == Specifying font, needs LaTeX integration == # rc('font',**{'family':'serif','serif':['Palatino']}) rc('text', usetex=True) # == Set parameters == # n = 250 replications = 100000 distribution = uniform(loc=0, scale=(np.pi / 2)) mu, s = distribution.mean(), distribution.std() g = np.sin g_prime = np.cos # == Generate obs of sqrt{n} (g(\bar X_n) - g(\mu)) == # data = distribution.rvs((replications, n)) sample_means = data.mean(axis=1) # Compute mean of each row error_obs = np.sqrt(n) * (g(sample_means) - g(mu)) # == Plot == # asymptotic_sd = g_prime(mu) * s fig, ax = plt.subplots() xmin = -3 * g_prime(mu) * s xmax = -xmin ax.set_xlim(xmin, xmax) ax.hist(error_obs, bins=60, alpha=0.5, normed=True) xgrid = np.linspace(xmin, xmax, 200) lb = r"$N(0, g'(\mu)^2 \sigma^2)$" ax.plot(xgrid, norm.pdf(xgrid, scale=asymptotic_sd), 'k-', lw=2, label=lb) ax.legend() plt.show()

The program produces a gure that looks as follows What happens when you replace [0, /2] with [0, ]? In this case, the mean of this distribution is /2, and since g = cos, we have g () = 0 Hence the conditions of the delta theorem are not satised

Solution to Exercise 2
First we want to verify the claim that

n ) N ( 0, I ) nQ(X

This is straightforward given the facts presented in the exercise Let Yn :=

n ) n(X

and

Y N ( 0, )

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.8. EXERCISES FROM LLN AND CLT

414

By the multivariate CLT and the continuous mapping theorem, we have QYn QY Since linear combinations of normal random variables are normal, the vector QY is also normal Its mean is clearly 0, and its variance covariance matrix is Var[QY] = Q Var[Y]Q = QQ = I In conclusion, QYn QY N (0, I), which is what we aimed to show Now we turn to the simulation exercise Our solution is as follows
""" Illustrates a consequence of the vector CLT. The underlying random vector is X = (W, U + W), where W is Uniform(-1, 1), U is Uniform(-2, 2), and U and W are independent of each other. """ import numpy as np from scipy.stats import uniform, chi2 from scipy.linalg import inv, sqrtm import matplotlib.pyplot as plt # == Set parameters == # n = 250 replications = 50000 dw = uniform(loc=-1, scale=2) # Uniform(-1, 1) du = uniform(loc=-2, scale=4) # Uniform(-2, 2) sw, su = dw.std(), du.std() vw, vu = sw**2, su**2 Sigma = ((vw, vw), (vw, vw + vu))
d d

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.9. EXERCISES FROM FINITE MARKOV CHAINS

415

Sigma = np.array(Sigma) # == Compute Sigma^{-1/2} == # Q = inv(sqrtm(Sigma)) # == Generate observations of the normalized sample mean == # error_obs = np.empty((2, replications)) for i in range(replications): # == Generate one sequence of bivariate shocks == # X = np.empty((2, n)) W = dw.rvs(n) U = du.rvs(n) # == Construct the n observations of the random vector == # X[0, :] = W X[1, :] = W + U # == Construct the i-th observation of Y_n == # error_obs[:, i] = np.sqrt(n) * X.mean(axis=1) # == Premultiply by Q and then take the squared norm == # temp = np.dot(Q, error_obs) chisq_obs = np.sum(temp**2, axis=0) # == Plot == # fig, ax = plt.subplots() xmax = 8 ax.set_xlim(0, 8) xgrid = np.linspace(0, 8, 200) lb = "Chi-squared with 2 degrees of freedom" ax.plot(xgrid, chi2.pdf(xgrid, 2), 'k-', lw=2, label=lb) ax.legend() ax.hist(chisq_obs, bins=50, normed=True) plt.show()

When run it produces the following gure (modulo randomness) As expected, the histogram ts well with the chi-squared distribution with 2 degrees of freedom

6.9 Exercises from Finite Markov Chains


Solution to Exercise 1
""" Compute the fraction of time that the worker spends unemployed, and compare it to the stationary probability. """ import numpy as np import matplotlib.pyplot as plt import mc_tools alpha = beta = 0.1

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.9. EXERCISES FROM FINITE MARKOV CHAINS

416

N = 10000 p = beta / (alpha + beta) P = ((1 - alpha, alpha), (beta, 1 - beta)) P = np.array(P) # Careful: P and p are distinct

fig, ax = plt.subplots() ax.set_ylim(-0.25, 0.25) ax.grid() ax.hlines(0, 0, N, lw=2, alpha=0.6)

# Horizonal line at zero

for x0, col in ((0, 'blue'), (1, 'green')): # == Generate time series for worker that starts at x0 == # X = mc_tools.sample_path(P, x0, N) # == Compute fraction of time spent unemployed, for each n == # X_bar = (X == 0).cumsum() / (1 + np.arange(N, dtype=float)) # == Plot == # ax.fill_between(range(N), np.zeros(N), X_bar - p, color=col, alpha=0.1) ax.plot(X_bar - p, color=col, label=r'$X_0 = \, {} $'.format(x0)) ax.plot(X_bar - p, 'k-', alpha=0.6) # Overlay in black--make lines clearer ax.legend(loc='upper right') plt.show()

Solution to Exercise 2
""" Return list of pages, ordered by rank """ from __future__ import print_function, division # Omit if using Python 3.x

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.9. EXERCISES FROM FINITE MARKOV CHAINS

417

import numpy as np import mc_tools from operator import itemgetter import re infile = 'web_graph_data.txt' alphabet = 'abcdefghijklmnopqrstuvwxyz' n = 14 # Total number of web pages (nodes) # == Create a matrix Q indicating existence of links == # # * Q[i, j] = 1 if there is a link from i to j # * Q[i, j] = 0 otherwise Q = np.zeros((n, n), dtype=int) f = open(infile, 'r') edges = f.readlines() f.close() for edge in edges: from_node, to_node = re.findall('\w', edge) i, j = alphabet.index(from_node), alphabet.index(to_node) Q[i, j] = 1 # == Create the corresponding Markov matrix P == # P = np.empty((n, n)) for i in range(n): P[i,:] = Q[i,:] / Q[i,:].sum() # == Compute the stationary distribution r == # r = mc_tools.compute_stationary(P) ranked_pages = {alphabet[i] : r[i] for i in range(n)} # == Print solution, sorted from highest to lowest rank == # print('Rankings\n ***') for name, rank in sorted(ranked_pages.iteritems(), key=itemgetter(1), reverse=1): print('{0}: {1:.4}'.format(name, rank))

Heres the output from the program


Rankings *** g: 0.1607 j: 0.1594 m: 0.1195 n: 0.1088 k: 0.09106 b: 0.08326 e: 0.05312 i: 0.05312 c: 0.04834 h: 0.0456 l: 0.03202 d: 0.03056 f: 0.01164 a: 0.002911

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.9. EXERCISES FROM FINITE MARKOV CHAINS

418

Solution to Exercise 3
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: tauchen.py Authors: John Stachurski and Thomas Sargent LastModified: 11/08/2013 Discretizes Gaussian linear AR(1) processes via Tauchen's method """ import numpy as np from scipy.stats import norm def approx_markov(rho, sigma_u, m=3, n=7): """ Computes the Markov matrix associated with a discretized version of the linear Gaussian AR(1) process y_{t+1} = rho * y_t + u_{t+1} according to Tauchen's method. zero mean. Parameters: * * * * rho is the correlation coefficient sigma_u is the standard deviation of u m parameterizes the width of the state space n is the number of states Here {u_t} is an iid Gaussian process with

Returns: * x, the state space, as a NumPy array * a matrix P, where P[i,j] is the probability of transitioning from x[i] to x[j] """ F = norm(loc=0, scale=sigma_u).cdf std_y = np.sqrt(sigma_u**2 / (1-rho**2)) x_max = m * std_y x_min = - x_max x = np.linspace(x_min, x_max, n) step = (x_max - x_min) / (n - 1) half_step = 0.5 * step P = np.empty((n, n))

# # # #

standard deviation of y_t top of discrete state space bottom of discrete state space discretized state space

for i in range(n): P[i, 0] = F(x[0]-rho * x[i] + half_step) P[i, n-1] = 1 - F(x[n-1] - rho * x[i] - half_step) for j in range(1, n-1): z = x[j] - rho * x[i]

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.10. EXERCISES FROM SCHELLINGS SEGREGATION MODEL

419

P[i, j] = F(z + half_step) - F(z - half_step) return x, P

6.10 Exercises from Schelling's Segregation Model


Solution to Exercise 1
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: schelling.py Authors: John Stachurski and Thomas J. Sargent LastModified: 11/08/2013 """ from random import uniform from math import sqrt import matplotlib.pyplot as plt num_of_type_0 = 250 num_of_type_1 = 250 num_neighbors = 10 require_same_type = 4 class Agent: def __init__(self, type): self.type = type self.draw_location() def draw_location(self): self.location = uniform(0, 1), uniform(0, 1) def get_distance(self, other): "Computes euclidean distance between self and other agent." a = (self.location[0] - other.location[0])**2 b = (self.location[1] - other.location[1])**2 return sqrt(a + b) def happy(self, agents): "True if sufficient number of nearest neighbors are of the same type." distances = [] # distances is a list of pairs (d, agent), where d is distance from # agent to self for agent in agents: if self != agent: distance = self.get_distance(agent) distances.append((distance, agent)) # == sort from smallest to largest, according to distance == #

# Number of agents regarded as neighbors # Want at least this many neighbors to be same type

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.10. EXERCISES FROM SCHELLINGS SEGREGATION MODEL

420

distances.sort() # And extract the neighboring agents neighbors = [agent for d, agent in distances[:num_neighbors]] # == count how many neighbors have the same type as self == # num_same_type = sum(self.type == agent.type for agent in neighbors) return num_same_type >= require_same_type def update(self, agents): "If not happy, then randomly choose new locations until happy." while not self.happy(agents): self.draw_location() def plot_distribution(agents, cycle_num): "Plot the distribution of agents after cycle_num rounds of the loop." x_values_0, y_values_0 = [], [] x_values_1, y_values_1 = [], [] # == Obtain locations of each type == # for agent in agents: x, y = agent.location if agent.type == 0: x_values_0.append(x) y_values_0.append(y) else: x_values_1.append(x) y_values_1.append(y) fig, ax = plt.subplots() plot_args = {'markersize' : 8, 'alpha' : 0.6} ax.set_axis_bgcolor('azure') ax.plot(x_values_0, y_values_0, 'o', markerfacecolor='orange', **plot_args) ax.plot(x_values_1, y_values_1, 'o', markerfacecolor='green', **plot_args) ax.set_title('Cycle {}'.format(cycle_num - 1)) fig.savefig('schelling_fig{}.png'.format(cycle_num)) # == Main == # # == Create a list of agents == # agents = [Agent(0) for i in range(num_of_type_0)] agents.extend(Agent(1) for i in range(num_of_type_1)) count = 1 # == Loop until none wishes to move == # while 1: print 'Entering loop ', count plot_distribution(agents, count) count += 1 no_one_moved = True for agent in agents: old_location = agent.location agent.update(agents) if agent.location != old_location: no_one_moved = False

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.11. EXERCISES FROM LINEAR STATE SPACE MODELS

421

if no_one_moved: break

6.11 Exercises from Linear State Space Models


Solution to Exercise 1
import numpy as np import matplotlib.pyplot as plt from lss import LSS phi_0, phi_1, phi_2 = 1, 0.8, -0.8 A = [[phi_0, 0, 0], [0, phi_1, phi_2], [0, 1, 0]] C = np.zeros((3, 1)) G = [0, 1, 0] ar = LSS(A, C, G, mu_0=np.ones(3)) x, y = ar.simulate(ts_length=50) fig, ax = plt.subplots(figsize=(8, 4.6)) y = y.flatten() ax.plot(y, 'b-', lw=2, alpha=0.7) ax.grid() ax.set_xlabel('time') ax.set_ylabel(r'$y_t$', fontsize=16) plt.show()

Solution to Exercise 2
import numpy as np import matplotlib.pyplot as plt from lss import LSS phi_1, phi_2, phi_3, phi_4 = 0.5, -0.2, 0, 0.5 sigma = 0.2 A = [[phi_1, phi_2, phi_3, [1, 0, 0, [0, 1, 0, [0, 0, 1, C = [sigma, 0, 0, 0] G = [1, 0, 0, 0] phi_4], 0], 0], 0]]

ar = LSS(A, C, G, mu_0=np.ones(4)) x, y = ar.simulate(ts_length=200)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.11. EXERCISES FROM LINEAR STATE SPACE MODELS

422

fig, ax = plt.subplots(figsize=(8, 4.6)) y = y.flatten() ax.plot(y, 'b-', lw=2, alpha=0.7) ax.grid() ax.set_xlabel('time') ax.set_ylabel(r'$y_t$', fontsize=16) plt.show()

Solution to Exercise 3
from __future__ import division import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm from lss import LSS import random phi_1, phi_2, phi_3, phi_4 = 0.5, -0.2, 0, 0.5 sigma = 0.1 A = [[phi_1, phi_2, phi_3, [1, 0, 0, [0, 1, 0, [0, 0, 1, C = [sigma, 0, 0, 0] G = [1, 0, 0, 0] phi_4], 0], 0], 0]]

I = 20 T = 50 ar = LSS(A, C, G, mu_0=np.ones(4)) ymin, ymax = -0.5, 1.15 fig, ax = plt.subplots() ax.set_ylim(ymin, ymax) ax.set_xlabel(r'time', fontsize=16) ax.set_ylabel(r'$y_t$', fontsize=16) ensemble_mean = np.zeros(T) for i in range(I): x, y = ar.simulate(ts_length=T) y = y.flatten() ax.plot(y, 'c-', lw=0.8, alpha=0.5) ensemble_mean = ensemble_mean + y ensemble_mean = ensemble_mean / I ax.plot(ensemble_mean, color='b', lw=2, alpha=0.8, label=r'$\bar y_t$') m = ar.moment_sequence() population_means = [] for t in range(T): mu_x, mu_y, Sigma_x, Sigma_y = m.next()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.11. EXERCISES FROM LINEAR STATE SPACE MODELS

423

population_means.append(mu_y) ax.plot(population_means, color='g', lw=2, alpha=0.8, label=r'$G\mu_t$') ax.legend(ncol=2) plt.show()

Solution to Exercise 4
import numpy as np import matplotlib.pyplot as plt from lss import LSS import random phi_1, phi_2, phi_3, phi_4 = 0.5, -0.2, 0, 0.5 sigma = 0.1 A = [[phi_1, phi_2, phi_3, [1, 0, 0, [0, 1, 0, [0, 0, 1, C = [sigma, 0, 0, 0] G = [1, 0, 0, 0] T0 T1 T2 T4 = = = = 10 50 75 100 phi_4], 0], 0], 0]]

ar = LSS(A, C, G, mu_0=np.ones(4)) ymin, ymax = -0.8, 1.0 fig, ax = plt.subplots(figsize=(8, 5)) ax.grid(alpha=0.4) ax.set_ylim(ymin, ymax) ax.set_ylabel(r'$y_t$', fontsize=16) ax.vlines((T0, T1, T2), -1.5, 1.5) ax.set_xticks((T0, T1, T2)) ax.set_xticklabels((r"$T$", r"$T'$", r"$T''$"), fontsize=14) mu_x, mu_y, Sigma_x, Sigma_y = ar.stationary_distributions() ar.mu_0 = mu_x ar.Sigma_0 = Sigma_x for i in range(80): rcolor = random.choice(('c', 'g', 'b')) x, y = ar.simulate(ts_length=T4) y = y.flatten() ax.plot(y, color=rcolor, lw=0.8, alpha=0.5) ax.plot((T0, T1, T2), (y[T0], y[T1], y[T2],), 'ko', alpha=0.5)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.12. EXERCISES FROM A FIRST LOOK AT THE KALMAN FILTER

424

plt.show()

6.12 Exercises from A First Look at the Kalman Filter


Solution to Exercise 1
To use kalman.py for this application, we need to set
theta = 10 A, G, Q, R = 1, 1, 0, 1 x_hat_0, Sigma_0 = 8, 1

We can then generate our draws of yt via + N (0, 1) and update the densities using these values The code for doing this is below. In the code, note the use of LaTeX expressions inside the gure labels and title. This requires proper integration of LaTeX and Matplotlib. (If you havent set this up, then replace the LaTeX expressions with ordinary text)
import numpy as np import matplotlib.pyplot as plt from kalman import Kalman from scipy.stats import norm ## Parameters theta = 10 A, G, Q, R = 1, 1, 0, 1 x_hat_0, Sigma_0 = 8, 1 ## Initialize Kalman filter kalman = Kalman(A, G, Q, R) kalman.set_state(x_hat_0, Sigma_0) N = 5 fig, ax = plt.subplots() xgrid = np.linspace(theta - 5, theta + 2, 200) for i in range(N): # Record the current predicted mean and variance, and plot their densities m, v = kalman.current_x_hat, kalman.current_Sigma m, v = float(m), float(v) ax.plot(xgrid, norm.pdf(xgrid, loc=m, scale=np.sqrt(v)), label=r'$t=%d $' % i) # Generate the noisy signal y = theta + norm.rvs(size=1) # Update the Kalman filter kalman.update(y) ax.set_title(r'First %d densities when $\theta = %.1f $' % (N, theta)) ax.legend(loc='upper left') plt.show()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.12. EXERCISES FROM A FIRST LOOK AT THE KALMAN FILTER

425

Solution to Exercise 2
Borrowing code from the solution to Exercise 1, we can solve Exercise 2 as follows
import numpy as np import matplotlib.pyplot as plt from kalman import Kalman from scipy.stats import norm from scipy.integrate import quad ## Parameters theta = 10 A, G, Q, R = 1, 1, 0, 1 x_hat_0, Sigma_0 = 8, 1 epsilon = 0.1 ## Initialize Kalman filter kalman = Kalman(A, G, Q, R) kalman.set_state(x_hat_0, Sigma_0) T = 600 z = np.empty(T) for t in range(T): # Record the current predicted mean and variance, and plot their densities m, v = kalman.current_x_hat, kalman.current_Sigma m, v = float(m), float(v) f = lambda x: norm.pdf(x, loc=m, scale=np.sqrt(v)) integral, error = quad(f, theta - epsilon, theta + epsilon) z[t] = 1 - integral # Generate the noisy signal and update the Kalman filter kalman.update(theta + norm.rvs(size=1)) fig, ax = plt.subplots() ax.set_ylim(0, 1) ax.set_xlim(0, T) ax.plot(range(T), z) ax.fill_between(range(T), np.zeros(T), z, color="blue", alpha=0.2) plt.show()

Solution to Exercise 3
from __future__ import print_function # Remove for Python 3.x import numpy as np from numpy.random import multivariate_normal import matplotlib.pyplot as plt from scipy.linalg import eigvals from kalman import Kalman # G R A === Define A, Q, G, R === # = np.eye(2) = 0.5 * np.eye(2) = [[0.5, 0.4], [0.6, 0.3]]

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.12. EXERCISES FROM A FIRST LOOK AT THE KALMAN FILTER

426

Q = 0.3 * np.eye(2) # === Define the prior density === # Sigma = [[0.9, 0.3], [0.3, 0.9]] Sigma = np.array(Sigma) x_hat = np.array([8, 8]) # === Initialize the Kalman filter === # kn = Kalman(A, G, Q, R) kn.set_state(x_hat, Sigma) # === Set the true initial value of the state === # x = np.zeros(2) # == Print eigenvalues of A == # print("Eigenvalues of A:") print(eigvals(A)) # == Print stationary Sigma == # S, K = kn.stationary_values() print("Stationary prediction error variance:") print(S) # === Generate the plot === # T = 50 e1 = np.empty(T) e2 = np.empty(T) for t in range(T): # == Generate signal and update prediction == # y = multivariate_normal(mean=np.dot(G, x), cov=R) kn.update(y) # == Update state and record error == # Ax = np.dot(A, x) x = multivariate_normal(mean=Ax, cov=Q) e1[t] = np.sum((x - kn.current_x_hat)**2) e2[t] = np.sum((x - Ax)**2) fig, ax = plt.subplots() ax.plot(range(T), e1, 'k-', lw=2, alpha=0.6, label='Kalman filter error') ax.plot(range(T), e2, 'g-', lw=2, alpha=0.6, label='conditional expectation error') ax.legend() plt.show()

Solution to Exercise 4
You can use the code from the proceeding exercise

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.13. EXERCISES FROM SHORTEST PATHS

427

6.13 Exercises from Shortest Paths


Solution to Exercise 1
""" Source: QE by John Stachurski and Thomas J. Sargent Filename: solution_shortpath.py Authors: John Stachurksi and Thomas J. Sargent LastModified: 11/08/2013 """ def read_graph(): """ Read in the graph from the data file. The graph is stored as a dictionary, where the keys are the nodes, and the values are a list of pairs (d, c), where d is a node and c is a number. If (d, c) is in the list for node n, then d can be reached from n at cost c. """ graph = {} infile = open('graph.txt') for line in infile: elements = line.split(',') node = elements.pop(0).strip() graph[node] = [] if node != 'node99': for element in elements: destination, cost = element.split() graph[node].append((destination.strip(), float(cost))) infile.close() return graph def update_J(J, graph): "The Bellman operator." next_J = {} for node in graph: if node == 'node99': next_J[node] = 0 else: next_J[node] = min(cost + J[dest] for dest, cost in graph[node]) return next_J def print_best_path(J, graph): """ Given a cost-to-go function, computes the best path. At each node n, the function prints the current location, looks at all nodes that can be reached from n, and moves to the node m which minimizes c + J[m], where c is the cost of moving to m. """ sum_costs = 0 current_location = 'node0' while current_location != 'node99': print current_location running_min = 1e100 # Any big number

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.14. EXERCISES FROM INFINITE HORIZON DYNAMIC PROGRAMMING

428

for destination, cost in graph[current_location]: cost_of_path = cost + J[destination] if cost_of_path < running_min: running_min = cost_of_path minimizer_cost = cost minimizer_dest = destination current_location = minimizer_dest sum_costs += minimizer_cost print 'node99' print print 'Cost: ', sum_costs ## Main loop graph = read_graph() M = 1e10 J = {} for node in graph: J[node] = M J['node99'] = 0 while 1: next_J = update_J(J, graph) if next_J == J: break else: J = next_J print_best_path(J, graph)

6.14 Exercises from Innite Horizon Dynamic Programming


Solution to Exercise 1
import matplotlib.pyplot as plt from optgrowth import growthModel, bellman_operator, compute_greedy from compute_fp import compute_fixed_point alpha, beta = 0.65, 0.95 gm = growthModel() true_sigma = (1 - alpha * beta) * gm.grid**alpha w = 5 * gm.u(gm.grid) - 25 # Initial condition fig, ax = plt.subplots(3, 1, figsize=(8, 10)) for i, n in enumerate((2, 4, 6)): ax[i].set_ylim(0, 1) ax[i].set_xlim(0, 2) ax[i].set_yticks((0, 1))

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.14. EXERCISES FROM INFINITE HORIZON DYNAMIC PROGRAMMING

429

ax[i].set_xticks((0, 2)) v_star = compute_fixed_point(bellman_operator, gm, w, max_iter=n) sigma = compute_greedy(gm, v_star) ax[i].plot(gm.grid, sigma, 'b-', lw=2, alpha=0.8, label='approximate optimal policy') ax[i].plot(gm.grid, true_sigma, 'k-', lw=2, alpha=0.8, label='true optimal policy') ax[i].legend(loc='upper left') ax[i].set_title('{} value function iterations'.format(n)) plt.show()

Solution to Exercise 2
import matplotlib.pyplot as plt import numpy as np from scipy import interp from optgrowth import growthModel, bellman_operator, compute_greedy from compute_fp import compute_fixed_point gm = growthModel() w = 5 * gm.u(gm.grid) - 25 # To be used as an initial condition discount_factors = (0.9, 0.94, 0.98) series_length = 25 fig, ax = plt.subplots() ax.set_xlabel("time") ax.set_ylabel("capital") for beta in discount_factors: # Compute the optimal policy given the discount factor gm.beta = beta v_star = compute_fixed_point(bellman_operator, gm, w, max_iter=20) sigma = compute_greedy(gm, v_star) # Compute the corresponding time series for capital k = np.empty(series_length) k[0] = 0.1 sigma_function = lambda x: interp(x, gm.grid, sigma) for t in range(1, series_length): k[t] = gm.f(k[t-1]) - sigma_function(k[t-1]) ax.plot(k, 'o-', lw=2, alpha=0.75, label=r'$\beta = {}$'.format(beta)) ax.legend(loc='lower right') plt.show()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.15. EXERCISES FROM LQ CONTROL PROBLEMS

430

6.15 Exercises from LQ Control Problems


Solution to Exercise 1
Heres one solution We use some fancy plot commands to get a certain style feel free to use simpler ones
""" An LQ permanent income / life-cycle model with hump-shaped income y_t = m1 * t + m2 * t^2 + sigma w_{t+1} where {w_t} is iid N(0, 1) and the coefficients m1 and m2 are chosen so that p(t) = m1 * t + m2 * t^2 has an inverted U shape with p(0) = 0, p(T/2) = mu and p(T) = 0. """ from __future__ import division import numpy as np import matplotlib.pyplot as plt from lqcontrol import * # == Model parameters == # r = 0.05 beta = 1 / (1 + r) T = 50 c_bar = 1.5 sigma = 0.15 mu = 2 q = 1e4 m1 = T * (mu / (T/2)**2) m2 = - (mu / (T/2)**2) # == Formulate as an LQ problem == # Q = 1 R = np.zeros((4, 4)) Rf = np.zeros((4, 4)) Rf[0, 0] = q A = [[1 + r, -c_bar, m1, m2], [0, 1, 0, 0], [0, 1, 1, 0], [0, 1, 2, 1]] B = [[-1], [0], [0], [0]] C = [[sigma], [0], [0], [0]] # == Compute solutions and simulate == #

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.15. EXERCISES FROM LQ CONTROL PROBLEMS

431

lq = LQ(Q, R, A, B, C, beta=beta, T=T, Rf=Rf) x0 = (0, 1, 0, 0) xp, up, wp = lq.compute_sequence(x0) # == Convert results back to assets, consumption and income == # ap = xp[0, :] # Assets c = up.flatten() + c_bar # Consumption time = np.arange(1, T+1) income = wp[0, 1:] + m1 * time + m2 * time**2 # Income # == Plot results == # n_rows = 2 fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10)) plt.subplots_adjust(hspace=0.5) for i in range(n_rows): axes[i].grid() axes[i].set_xlabel(r'Time') bbox = (0., 1.02, 1., .102) legend_args = {'bbox_to_anchor' : bbox, 'loc' : 3, 'mode' : 'expand'} p_args = {'lw' : 2, 'alpha' : 0.7} axes[0].plot(range(1, T+1), income, 'g-', label="non-financial income", **p_args) axes[0].plot(range(T), c, 'k-', label="consumption", **p_args) axes[1].plot(range(T+1), np.zeros(T+1), 'k-') axes[0].legend(ncol=2, **legend_args) axes[1].plot(range(T+1), ap.flatten(), 'b-', label="assets", **p_args) axes[1].plot(range(T), np.zeros(T), 'k-') axes[1].legend(ncol=1, **legend_args) plt.show()

Solution to Exercise 2
""" An permanent income / life-cycle model with polynomial growth in income over working life followed by a fixed retirement income. The model is solved by combining two LQ programming problems as described in the lecture. """ from __future__ import division import numpy as np import matplotlib.pyplot as plt from lqcontrol import * # == Model parameters == # r = 0.05 beta = 1 / (1 + r) T = 60 K = 40

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.15. EXERCISES FROM LQ CONTROL PROBLEMS

432

c_bar sigma mu q s m1 m2

= = = = = = =

4 0.35 4 1e4 1 2 * mu / K - mu / K**2

# == Formulate LQ problem 1 (retirement) == # Q = 1 R = np.zeros((4, 4)) Rf = np.zeros((4, 4)) Rf[0, 0] = q A = [[1 + r, s - c_bar, 0, 0], [0, 1, 0, 0], [0, 1, 1, 0], [0, 1, 2, 1]] B = [[-1], [0], [0], [0]] C = [[0], [0], [0], [0]] # == Initialize LQ instance for retired agent == # lq_retired = LQ(Q, R, A, B, C, beta=beta, T=T-K, Rf=Rf) # == Iterate back to start of retirement, record final value function == # for i in range(T-K): lq_retired.update_values() Rf2 = lq_retired.P # == Formulate LQ problem 2 (working life) == # R = np.zeros((4, 4)) A = [[1 + r, -c_bar, m1, m2], [0, 1, 0, 0], [0, 1, 1, 0], [0, 1, 2, 1]] B = [[-1], [0], [0], [0]] C = [[sigma], [0], [0], [0]] # == Set up working life LQ instance with terminal Rf from lq_retired == # lq_working = LQ(Q, R, A, B, C, beta=beta, T=K, Rf=Rf2) # == Simulate working state / control paths == # x0 = (0, 1, 0, 0)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.15. EXERCISES FROM LQ CONTROL PROBLEMS

433

xp_w, up_w, wp_w = lq_working.compute_sequence(x0) # == Simulate retirement paths (note the initial condition) == # xp_r, up_r, wp_r = lq_retired.compute_sequence(xp_w[:, K]) # == Convert results back to assets, consumption and income == # xp = np.column_stack((xp_w, xp_r[:, 1:])) assets = xp[0, :] # Assets up = np.column_stack((up_w, up_r)) c = up.flatten() + c_bar # Consumption time = np.arange(1, K+1) income_w = wp_w[0, 1:K+1] + m1 * time + m2 * time**2 # Income income_r = np.ones(T-K) * s income = np.concatenate((income_w, income_r)) # == Plot results == # n_rows = 2 fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10)) plt.subplots_adjust(hspace=0.5) for i in range(n_rows): axes[i].grid() axes[i].set_xlabel(r'Time') bbox = (0., 1.02, 1., .102) legend_args = {'bbox_to_anchor' : bbox, 'loc' : 3, 'mode' : 'expand'} p_args = {'lw' : 2, 'alpha' : 0.7} axes[0].plot(range(1, T+1), income, 'g-', label="non-financial income", **p_args) axes[0].plot(range(T), c, 'k-', label="consumption", **p_args) axes[1].plot(range(T+1), np.zeros(T+1), 'k-') axes[0].legend(ncol=2, **legend_args) axes[1].plot(range(T+1), assets, 'b-', label="assets", **p_args) axes[1].plot(range(T), np.zeros(T), 'k-') axes[1].legend(ncol=1, **legend_args) plt.show()

Solution to Exercise 3
The rst task is to nd the matrices A, B, C, Q, R that dene the LQ problem t qt 1) , while ut = qt+1 qt Recall that xt = (q t = m0 + m1 dt , and then, with some Letting m0 := ( a0 c)/2a1 and m1 := 1/2a1 , we can write q manipulation t +1 = m 0 (1 ) + q t + m 1 w t +1 q By our denition of ut , the dynamics of qt are qt+1 = qt + ut Using these facts you should be able to build the correct A, B, C matrices (and then check them against those found in the solution code below) T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

6.15. EXERCISES FROM LQ CONTROL PROBLEMS

434

Suitable R, Q matrices can be found by inspecting the objective function, which we repeat here for convenience: min E Our solution code is
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: solution_lqc_ex3.py An infinite horizon profit maximization problem for a monopolist with adjustment costs. """ from __future__ import division import numpy as np import matplotlib.pyplot as plt from lqcontrol import * # == Model parameters == # a0 = 5 a1 = 0.5 sigma = 0.15 rho = 0.9 gamma = 1 beta = 0.95 c = 2 T = 120 # == Useful constants == # m0 = (a0 - c) / (2 * a1) m1 = 1 / (2 * a1) # == Formulate LQ problem == # Q = gamma R = [[a1, -a1, 0], [-a1, a1, 0], [0, 0, 0]] A = [[rho, 0, m0 * (1 - rho)], [0, 1, 0], [0, 0, 1]] B = [[0], [1], [0]] C = [[m1 * sigma], [0], [0]] lq = LQ(Q, R, A, B, C=C, beta=beta) # == Simulate state / control paths == # x0 = (m0, 2, 1) xp, up, wp = lq.compute_sequence(x0, ts_length=150)
t =0

t )2 + u2 t a1 ( q t q t

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.16. EXERCISES FROM RATIONAL EXPECTATIONS EQUILIBRIUM

435

q_bar = xp[0, :] q = xp[1, :] # == Plot simulation results == # fig, ax = plt.subplots(figsize=(10, 6.5)) ax.set_xlabel('Time') # == Some fancy plotting stuff -- simplify if you prefer == # bbox = (0., 1.01, 1., .101) legend_args = {'bbox_to_anchor' : bbox, 'loc' : 3, 'mode' : 'expand'} p_args = {'lw' : 2, 'alpha' : 0.6} time = range(len(q)) ax.set_xlim(0, max(time)) ax.plot(time, q_bar, 'k-', lw=2, alpha=0.6, label=r'$\bar q_t$') ax.plot(time, q, 'b-', lw=2, alpha=0.6, label=r'$q_t$') ax.legend(ncol=2, **legend_args) s = r'dynamics with $\gamma = {}$'.format(gamma) ax.text(max(time) * 0.6, 1 * q_bar.max(), s, fontsize=14) plt.show()

6.16 Exercises from Rational Expectations Equilibrium


Solution to Exercise 1
To map a problem into a discounted optimal linear control problem, we need to dene state vector xt and control vector ut matrices A, B, Q, R that dene preferences and the law of motion for the state For the state and control vectors we choose yt xt = Yt , 1 For A, B, Q, R we set 1 0 0 A = 0 1 0 , 0 0 1

u = y t +1 y t

1 B = 0 , 0

0 a1 /2 a0 /2 0 0 , R = a1 /2 a0 /2 0 0

Q = /2

By multiplying out you can conrm that xt Rxt + ut Qut = rt xt+1 = Axt + But We can use the program lqcontrol.py from the main repository to solve the rms problem at the stated parameter values T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

6.16. EXERCISES FROM RATIONAL EXPECTATIONS EQUILIBRIUM

436

This will return an LQ policy F with the interpretation ut = Fxt , or yt+1 yt = F0 yt F1 Yt F2 Matching parameters with yt+1 = h0 + h1 yt + h2 Yt leads to h0 = F2 , h1 = 1 F0 , h2 = F1

""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: solution_ree_ex1.py Authors: Chase Coleman, Spencer Lyon, Thomas Sargent, John Stachurski Solves an exercise from the rational expectations module """ from __future__ import print_function import numpy as np from lqcontrol import LQ # == Model parameters == # a0 a1 beta gamma = = = = 100 0.05 0.95 10.0

# == Beliefs == # kappa0 kappa1 = 95.5 = 0.95

# == Formulate the LQ problem == # A = np.array([[1, 0, 0], [0, kappa1, kappa0], [0, 0, 1]]) B = np.array([1, 0, 0]) B.shape = 3, 1 R = np.array([[0, -a1/2, a0/2], [-a1/2, 0, 0], [a0/2, 0, 0]]) Q = -0.5 * gamma # == Solve for the optimal policy == # lq = LQ(Q, R, A, B, beta=beta) P, F, d = lq.stationary_values() F = F.flatten() out1 = "F = [{0:.3f}, {1:.3f}, {2:.3f}]".format(F[0], F[1], F[2]) h0, h1, h2 = -F[2], 1 - F[0], -F[1] out2 = "(h0, h1, h2) = ({0:.3f}, {1:.3f}, {2:.3f})".format(h0, h1, h2) print(out1) print(out2)

The output is F = [-0.000, 0.046, -96.949] and (h0, h1, h2) = (96.949, 1.000, -0.046), so that yt+1 = 96.949 + yt 0.046 Yt (6.1) T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

6.16. EXERCISES FROM RATIONAL EXPECTATIONS EQUILIBRIUM

437

For the case n > 1 recall that Yt = nyt , which, combined with (6.1), yields Yt+1 = n (96.949 + yt 0.046 Yt ) = n96.949 + (1 n0.046)Yt

Solution to Exercise 2
To determine whether a 0 , 1 pair forms the aggregate law of motion component of a rational expectations equilibrium, we can proceed as follows: 1. Determine the corresponding rm law of motion yt+1 = h0 + h1 yt + h2 Yt 2. Test whether the associated aggregate law Yt+1 = nh(Yt /n, Yt ) evaluates to Yt+1 = 0 + 1 Yt In the second step we can use Yt = nyt = yt , so that Yt+1 = nh(Yt /n, Yt ) becomes Yt+1 = h(Yt , Yt ) = h0 + (h1 + h2 )Yt Hence to test the second step we can test 0 = h0 and 1 = h1 + h2 The following code implements this test
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: solution_ree_ex2.py Authors: Chase Coleman, Spencer Lyon, Thomas Sargent, John Stachurski Solves an exercise from the rational expectations module """ from __future__ import print_function import numpy as np from lqcontrol import LQ from solution_ree_ex1 import beta, R, Q, B candidates = ( (94.0886298678, 0.923409232937), (93.2119845412, 0.984323478873), (95.0818452486, 0.952459076301) ) for kappa0, kappa1 in candidates: # == Form the associated law of motion == # A = np.array([[1, 0, 0], [0, kappa1, kappa0], [0, 0, 1]]) # == Solve the LQ problem for the firm == # lq = LQ(Q, R, A, B, beta=beta) P, F, d = lq.stationary_values() F = F.flatten() h0, h1, h2 = -F[2], 1 - F[0], -F[1] # == Test the equilibrium condition == # if np.allclose((kappa0, kappa1), (h0, h1 + h2)): print('Equilibrium pair =', kappa0, kappa1) print('(h0, h1, h2) = ', h0, h1, h2) break

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.16. EXERCISES FROM RATIONAL EXPECTATIONS EQUILIBRIUM

438

The output tells us that the answer is pair (iii), (95.0819, 1.0000, .0475)

which implies (h0 , h1 , h2 )

(Notice we use np.allclose to test equality of oating point numbers, since exact equality is too strict) Regarding the iterative algorithm, one could loop from a given (0 , 1 ) pair to the associated rm law and then to a new (0 , 1 ) pair This amounts to implementing the operator described in the lecture There is in general no guarantee that this iterative process will converge to a rational expectations equilibrium

Solution to Exercise 3
We are asked to write the planner problem as an LQ problem For the state and control vectors we choose x= For the LQ matrices we set A= 1 0 , 0 1 B= 1 , 0 R= Yt , 1 u = Yt+1 Yt

a1 /2 a0 /2 , a0 /2 0

Q = /2

By multiplying out you can conrm that xt Rxt + ut Qut = s(Yt , Yt+1 ) xt+1 = Axt + But By obtaining the optimal policy and using ut = Fxt or Yt+1 Yt = F0 Yt F1 we can obtain the implied aggregate law of motion via 0 = F1 and 1 = 1 F0 The Python code to solve this problem is below:
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: solution_ree_ex3.py Authors: Chase Coleman, Spencer Lyon, Thomas Sargent, John Stachurski Solves an exercise from the rational expectations module """ from __future__ import print_function import numpy as np from lqcontrol import LQ from solution_ree_ex1 import a0, a1, beta, gamma # == Formulate the planner's LQ problem == #

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.16. EXERCISES FROM RATIONAL EXPECTATIONS EQUILIBRIUM

439

A B R Q

= = = =

np.array([[1, 0], [0, 1]]) np.array([[1], [0]]) -np.array([[a1 / 2, -a0 / 2], [-a0 / 2, 0]]) - gamma / 2

# == Solve for the optimal policy == # lq = LQ(Q, R, A, B, beta=beta) P, F, d = lq.stationary_values() # == Print the results == # F = F.flatten() kappa0, kappa1 = -F[1], 1 - F[0] print(kappa0, kappa1)

The output yields the same (0 , 1 ) pair obtained as an equilibrium from the previous exercise

Solution to Exercise 4
The monopolists LQ problem is almost identical to the planners problem from the previous exercise, except that a1 a0 /2 R= a0 /2 0 The problem can be solved as follows
""" Origin: QE by John Stachurski and Thomas J. Sargent Filename: solution_ree_ex4.py Authors: Chase Coleman, Spencer Lyon, Thomas Sargent, John Stachurski Solves an exercise from the rational expectations module """ from __future__ import print_function import numpy as np from lqcontrol import LQ from solution_ree_ex1 import a0, a1, beta, gamma A B R Q = = = = np.array([[1, 0], [0, 1]]) np.array([[1], [0]]) - np.array([[a1, -a0 / 2], [-a0 / 2, 0]]) - gamma / 2

lq = LQ(Q, R, A, B, beta=beta) P, F, d = lq.stationary_values() F = F.flatten() m0, m1 = -F[1], 1 - F[0] print(m0, m1)

We see that the law of motion for the monopolist is approximately Yt+1 = 73.4729 + 0.9265Yt T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

6.17. EXERCISES FROM SEARCH WITH OFFER DISTRIBUTION UNKNOWN

440

In the rational expectations case the law of motion was approximately Yt+1 = 95.0818 + 0.9525Yt One way to compare these two laws of motion is by their xed points, which give long run equilibrium output in each case For laws of the form Yt+1 = c0 + c1 Yt , the xed point is c0 /(1 c1 ) If you crunch the numbers, you will see that the monopolist adopts a lower long run quantity than obtained by the competitive market, implying a higher market price This is analogous to the elementary static-case results

6.17 Exercises from Search with Oer Distribution Unknown


Solution to Exercise 1
""" Solves the "Offer Distribution Unknown" model by iterating on a guess of the reservation wage function. """ from scipy import interp import numpy as np from numpy import maximum as npmax import matplotlib.pyplot as plt from odu_vfi import searchProblem from scipy.integrate import fixed_quad from compute_fp import compute_fixed_point def res_wage_operator(sp, phi): """ Updates the reservation wage function guess phi via the operator Q. Returns the updated function Q phi, represented as the array new_phi. * sp is an instance of searchProblem, defined in odu_vfi * phi is a NumPy array with len(phi) = len(sp.pi_grid) """ beta, c, f, g, q = sp.beta, sp.c, sp.f, sp.g, sp.q # Simplify names phi_f = lambda p: interp(p, sp.pi_grid, phi) # Turn phi into a function new_phi = np.empty(len(phi)) for i, pi in enumerate(sp.pi_grid): def integrand(x): "Integral expression on right-hand side of operator" return npmax(x, phi_f(q(x, pi))) * (pi * f(x) + (1 - pi) * g(x)) integral, error = fixed_quad(integrand, 0, sp.w_max) new_phi[i] = (1 - beta) * c + beta * integral return new_phi if __name__ == '__main__': # If module is run rather than imported

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.18. EXERCISES FROM MODELING CAREER CHOICE

441

sp = searchProblem(pi_grid_size=50) phi_init = np.ones(len(sp.pi_grid)) w_bar = compute_fixed_point(res_wage_operator, sp, phi_init) fig, ax = plt.subplots() ax.plot(sp.pi_grid, w_bar, linewidth=2, color='black') ax.set_ylim(0, 2) ax.grid(axis='x', linewidth=0.25, linestyle='--', color='0.25') ax.grid(axis='y', linewidth=0.25, linestyle='--', color='0.25') ax.fill_between(sp.pi_grid, 0, w_bar, color='blue', alpha=0.15) ax.fill_between(sp.pi_grid, w_bar, 2, color='green', alpha=0.15) ax.text(0.42, 1.2, 'reject') ax.text(0.7, 1.8, 'accept') plt.show()

Heres a sample output generated by running this code

You should nd that the run time is much shorter than that of the value function approach in odu_vfi.py

6.18 Exercises from Modeling Career Choice


Solution to Exercise 1
The sample path gures can be generated with the following code
import matplotlib.pyplot as plt import numpy as np from discrete_rv import discreteRV from career import * from compute_fp import compute_fixed_point

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.18. EXERCISES FROM MODELING CAREER CHOICE

442

wp = workerProblem() v_init = np.ones((wp.N, wp.N))*100 v = compute_fixed_point(bellman, wp, v_init) optimal_policy = get_greedy(wp, v) F = discreteRV(wp.F_probs) G = discreteRV(wp.G_probs) def gen_path(T=20): i = j = 0 theta_index = [] epsilon_index = [] for t in range(T): if optimal_policy[i, j] == 1: # Stay put pass elif optimal_policy[i, j] == 2: # New job j = int(G.draw()) else: # New life i, j = int(F.draw()), int(G.draw()) theta_index.append(i) epsilon_index.append(j) return wp.theta[theta_index], wp.epsilon[epsilon_index] theta_path, epsilon_path = gen_path() fig = plt.figure() ax1 = plt.subplot(211) ax1.plot(epsilon_path, label='epsilon') ax1.plot(theta_path, label='theta') ax1.legend(loc='lower right') theta_path, epsilon_path = gen_path() ax2 = plt.subplot(212) ax2.plot(epsilon_path, label='epsilon') ax2.plot(theta_path, label='theta') ax2.legend(loc='lower right') plt.show()

Solution to Exercise 2
The median for the original parameterization can be computed as follows
import matplotlib.pyplot as plt import numpy as np from discrete_rv import discreteRV from career import * from compute_fp import compute_fixed_point wp = workerProblem() v_init = np.ones((wp.N, wp.N))*100 v = compute_fixed_point(bellman, wp, v_init) optimal_policy = get_greedy(wp, v) F = discreteRV(wp.F_probs) G = discreteRV(wp.G_probs)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.18. EXERCISES FROM MODELING CAREER CHOICE

443

def gen_first_passage_time(): t = 0 i = j = 0 theta_index = [] epsilon_index = [] while 1: if optimal_policy[i, j] == 1: # Stay put return t elif optimal_policy[i, j] == 2: # New job j = int(G.draw()) else: # New life i, j = int(F.draw()), int(G.draw()) t += 1 M = 25000 # Number of samples samples = np.empty(M) for i in range(M): samples[i] = gen_first_passage_time() print np.median(samples)

To compute the median with = 0.99 instead of the default value = 0.95, replace wp = workerProblem() with wp = workerProblem(beta=0.99) The medians are subject to randomness, but should be about 7 and 11 respectively. Not surprisingly, more patient workers will wait longer to settle down to their nal job

Solution to Exercise 3
Heres the code to reproduce the original gure
import matplotlib.pyplot as plt from matplotlib import cm from career import * from compute_fp import compute_fixed_point wp = workerProblem() v_init = np.ones((wp.N, wp.N))*100 v = compute_fixed_point(bellman, wp, v_init) optimal_policy = get_greedy(wp, v) fig = plt.figure(figsize=(6,6)) ax = fig.add_subplot(111) tg, eg = np.meshgrid(wp.theta, wp.epsilon) lvls=(0.5, 1.5, 2.5, 3.5) ax.contourf(tg, eg, optimal_policy.T, levels=lvls, cmap=cm.winter, alpha=0.5) ax.contour(tg, eg, optimal_policy.T, colors='k', levels=lvls, linewidths=2) ax.set_xlabel('theta', fontsize=14) ax.set_ylabel('epsilon', fontsize=14) ax.text(1.8, 2.5, 'new life', fontsize=14) ax.text(4.5, 2.5, 'new job', fontsize=14, rotation='vertical') ax.text(4.0, 4.5, 'stay put', fontsize=14) plt.show()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.19. EXERCISES FROM ON-THE-JOB SEARCH

444

Now we want to set G_a = G_b = 100 and generate a new gure with these parameters. To do this we replace
wp = workerProblem()

with
wp = workerProblem(G_a=100, G_b=100)

The gure now looks as follows

The region for which the worker will stay put has grown because the distribution for has become more concentrated around the mean, making high-paying jobs less realistic

6.19 Exercises from On-the-Job Search


Solution to Exercise 1
Heres code to produce the 45 degree diagram
import matplotlib.pyplot as plt import random from jv import workerProblem, bellman_operator

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.19. EXERCISES FROM ON-THE-JOB SEARCH

445

from compute_fp import compute_fixed_point import numpy as np # Set up wp = workerProblem(grid_size=25) G, pi, F = wp.G, wp.pi, wp.F

# Simplify names

v_init = wp.x_grid * 0.5 V = compute_fixed_point(bellman_operator, wp, v_init, max_iter=40) s_policy, phi_policy = bellman_operator(wp, V, return_policies=True) # Turn the policy function arrays into actual functions s = lambda y: np.interp(y, wp.x_grid, s_policy) phi = lambda y: np.interp(y, wp.x_grid, phi_policy) def h(x, b, U): return (1 - b) * G(x, phi(x)) + b * max(G(x, phi(x)), U) plot_grid_max, plot_grid_size = 1.2, 100 plot_grid = np.linspace(0, plot_grid_max, plot_grid_size) fig, ax = plt.subplots() ax.set_xlim(0, plot_grid_max) ax.set_ylim(0, plot_grid_max) ticks = (0.25, 0.5, 0.75, 1.0) ax.set_xticks(ticks) ax.set_yticks(ticks) ax.set_xlabel(r'$x_t$', fontsize=16) ax.set_ylabel(r'$x_{t+1}$', fontsize=16, rotation='horizontal') ax.plot(plot_grid, plot_grid, 'k--') # 45 degree line for x in plot_grid: for i in range(50): b = 1 if random.uniform(0, 1) < pi(s(x)) else 0 U = wp.F.rvs(1) y = h(x, b, U) ax.plot(x, y, 'go', alpha=0.25) plt.show()

Heres the gure that it produces Looking at the dynamics, we can see that If xt is below about 0.2 the dynamics are random, but xt+1 > xt is very likely As xt increases the dynamics become deterministic, and xt converges to a steady state value close to 1 Referring back to the Optimal policies gure, we see that xt 1 means that st = s( xt ) 0 and t = ( xt ) 0.6

Solution to Exercise 2
The gure can be produced as follows T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

6.19. EXERCISES FROM ON-THE-JOB SEARCH

446

from matplotlib import pyplot as plt from jv import workerProblem import numpy as np # Set up wp = workerProblem(grid_size=25) def xbar(phi): return (wp.A * phi**wp.alpha)**(1 / (1 - wp.alpha)) phi_grid = np.linspace(0, 1, 100) fig, ax = plt.subplots() ax.set_xlabel(r'$\phi$', fontsize=16) ax.plot(phi_grid, [xbar(phi) * (1 - phi) for phi in phi_grid], 'b-', label=r'$w^*(\phi)$') ax.legend(loc='upper left') plt.show()

It generates the plot Observe that the maximizer is around 0.6 This this is similar to the long run value for obtained in exercise 1 Hence the behaviour of the innitely patent worker is similar to that of the worker with = 0.96 This seems reasonable, and helps us conrm that our dynamic programming solutions are probably correct

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.20. EXERCISES FROM ESTIMATION OF SPECTRA

447

6.20 Exercises from Estimation of Spectra


Solution to Exercise 1
import numpy as np import matplotlib.pyplot as plt from linproc import linearProcess from estspec import periodogram ## Data n = 400 phi = 0.5 theta = 0, -0.8 lp = linearProcess(phi, theta) X = lp.simulation(ts_length=n) fig, ax = plt.subplots(3, 1) for i, wl in enumerate((15, 55, 175)): # window lengths x, y = periodogram(X) ax[i].plot(x, y, 'b-', lw=2, alpha=0.5, label='periodogram') x_sd, y_sd = lp.spectral_density(two_pi=False, resolution=120) ax[i].plot(x_sd, y_sd, 'r-', lw=2, alpha=0.8, label='spectral density') x, y_smoothed = periodogram(X, window='hamming', window_len=wl) ax[i].plot(x, y_smoothed, 'k-', lw=2, label='smoothed periodogram') ax[i].legend() ax[i].set_title('window length = {}'.format(wl))

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.21. EXERCISES FROM CONTINUOUS STATE MARKOV CHAINS

448

plt.show()

Solution to Exercise 2
import numpy as np import matplotlib.pyplot as plt from linproc import linearProcess import estspec lp = linearProcess(-0.9) wl = 65 fig, ax = plt.subplots(3, 1) for i in range(3): X = lp.simulation(ts_length=150) ax[i].set_xlim(0, np.pi) x_sd, y_sd = lp.spectral_density(two_pi=False, resolution=180) ax[i].semilogy(x_sd, y_sd, 'r-', lw=2, alpha=0.75, label='spectral density') x, y_smoothed = estspec.periodogram(X, window='hamming', window_len=wl) ax[i].semilogy(x, y_smoothed, 'k-', lw=2, alpha=0.75, label='standard smoothed periodogram') x, y_ar = estspec.ar_periodogram(X, window='hamming', window_len=wl) ax[i].semilogy(x, y_ar, 'b-', lw=2, alpha=0.75, label='AR smoothed periodogram') ax[i].legend(loc='upper left') plt.show()

6.21 Exercises from Continuous State Markov Chains


Solution to Exercise 1
""" Look ahead estimation of a TAR stationary density, where the TAR model is X' = theta |X| + sqrt(1 - theta^2) xi and xi is standard normal. Try running at n = 10, 100, 1000, 10000 to get an idea of the speed of convergence. """ import numpy as np from scipy.stats import norm, gaussian_kde import matplotlib.pyplot as plt from lae import lae phi = norm()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.21. EXERCISES FROM CONTINUOUS STATE MARKOV CHAINS

449

n = 500 theta = 0.8 # == Frequently used constants == # d = np.sqrt(1 - theta**2) delta = theta / d def psi_star(y): "True stationary density of the TAR Model" return 2 * norm.pdf(y) * norm.cdf(delta * y) def p(x, y): "Stochastic kernel for the TAR model." return phi.pdf((y - theta * np.abs(x)) / d) / d Z = phi.rvs(n) X = np.empty(n) for t in range(n-1): X[t+1] = theta * np.abs(X[t]) + d * Z[t] psi_est = lae(p, X) k_est = gaussian_kde(X) fig, ax = plt.subplots() ys = np.linspace(-3, 3, 200) ax.plot(ys, psi_star(ys), 'b-', lw=2, alpha=0.6, label='true') ax.plot(ys, psi_est(ys), 'g-', lw=2, alpha=0.6, label='look ahead estimate') ax.plot(ys, k_est(ys), 'k-', lw=2, alpha=0.6, label='kernel based estimate') ax.legend(loc='upper left') plt.show()

Solution to Exercise 2
Heres one program that does the job
import numpy as np from scipy.stats import lognorm, beta import matplotlib.pyplot as plt from lae import lae # == Define parameters == # s = 0.2 delta = 0.1 a_sigma = 0.4 # A = exp(B) where B ~ N(0, a_sigma) alpha = 0.4 # f(k) = k^{\alpha} phi = lognorm(a_sigma) def p(x, y): "Stochastic kernel, vectorized in x. Both x and y must be positive." d = s * x**alpha return phi.pdf((y - (1 - delta) * x) / d) / d n = 1000 # Number of observations at each date t

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.22. EXERCISES FROM OPTIMAL SAVINGS

450

T = 40

# Compute density of k_t at 1,...,T

fig, axes = plt.subplots(2, 2) axes = axes.flatten() xmax = 6.5 for i in range(4): ax = axes[i] ax.set_xlim(0, xmax) psi_0 = beta(5, 5, scale=0.5, loc=i*2)

# Initial distribution

# == Generate matrix s.t. t-th column is n observations of k_t == # k = np.empty((n, T)) A = phi.rvs((n, T)) k[:, 0] = psi_0.rvs(n) for t in range(T-1): k[:, t+1] = s * A[:,t] * k[:, t]**alpha + (1 - delta) * k[:, t] # == Generate T instances of lae using this data, one for each t == # laes = [lae(p, k[:, t]) for t in range(T)] ygrid = np.linspace(0.01, xmax, 150) greys = [str(g) for g in np.linspace(0.0, 0.8, T)] greys.reverse() for psi, g in zip(laes, greys): ax.plot(ygrid, psi(ygrid), color=g, lw=2, alpha=0.6) #ax.set_xlabel('capital') #title = r'Density of $k_1$ (lighter) to $k_T$ (darker) for $T={}$' #ax.set_title(title.format(T)) plt.show()

6.22 Exercises from Optimal Savings


Solution to Exercise 1
from matplotlib import pyplot as plt from ifp import * m = consumerProblem() K = 80 # Bellman iteration V, c = initialize(m) print "Starting value function iteration" for i in range(K): print "Current iterate = " + str(i) V = bellman_operator(m, V) c1 = bellman_operator(m, V, return_policy=True)

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.22. EXERCISES FROM OPTIMAL SAVINGS

451

# Policy iteration print "Starting policy function iteration" V, c2 = initialize(m) for i in range(K): print "Current iterate = " + str(i) c2 = coleman_operator(m, c2) fig, ax = plt.subplots() ax.plot(m.asset_grid, c1[:, 0], label='value function iteration') ax.plot(m.asset_grid, c2[:, 0], label='policy function iteration') ax.set_xlabel('asset level') ax.set_ylabel('consumption (low income)') ax.legend(loc='upper left') plt.show()

Solution to Exercise 2
from compute_fp from matplotlib import numpy as from ifp import import compute_fixed_point import pyplot as plt np coleman_operator, consumerProblem, initialize

r_vals = np.linspace(0, 0.04, 4) fig, ax = plt.subplots() for r_val in r_vals: cp = consumerProblem(r=r_val) v_init, c_init = initialize(cp) c = compute_fixed_point(coleman_operator, cp, c_init) ax.plot(cp.asset_grid, c[:, 0], label=r'$r = %.3f $' % r_val) ax.set_xlabel('asset level') ax.set_ylabel('consumption (low income)') ax.legend(loc='upper left') plt.show()

Solution to Exercise 3
from matplotlib import pyplot as plt import numpy as np from ifp import consumerProblem, coleman_operator, initialize from compute_fp import compute_fixed_point from scipy import interp import mc_tools def compute_asset_series(cp, T=500000): """ Simulates a time series of length T for assets, given optimal savings behavior. Parameter cp is an instance of consumerProblem """

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.22. EXERCISES FROM OPTIMAL SAVINGS

452

Pi, z_vals, R = cp.Pi, cp.z_vals, cp.R # Simplify names v_init, c_init = initialize(cp) c = compute_fixed_point(coleman_operator, cp, c_init) cf = lambda a, i_z: interp(a, cp.asset_grid, c[:, i_z]) a = np.zeros(T+1) z_seq = mc_tools.sample_path(Pi, sample_size=T) for t in range(T): i_z = z_seq[t] a[t+1] = R * a[t] + z_vals[i_z] - cf(a[t], i_z) return a if __name__ == '__main__': cp = consumerProblem(r=0.03, grid_max=4) a = compute_asset_series(cp) fig, ax = plt.subplots() ax.hist(a, bins=20, alpha=0.5, normed=True) ax.set_xlabel('assets') ax.set_xlim(-0.05, 0.75) plt.show()

Solution to Exercise 4
from matplotlib import pyplot as plt import numpy as np from compute_fp import compute_fixed_point from ifp import coleman_operator, consumerProblem, initialize from solution_ifp_ex3 import compute_asset_series M = 25 r_vals = np.linspace(0, 0.04, M) fig, ax = plt.subplots() for b in (1, 3): asset_mean = [] for r_val in r_vals: cp = consumerProblem(r=r_val, b=b) mean = np.mean(compute_asset_series(cp, T=250000)) asset_mean.append(mean) ax.plot(asset_mean, r_vals, label=r'$b = %d $' % b) ax.set_yticks(np.arange(.0, 0.045, .01)) ax.set_xticks(np.arange(-3, 2, 1)) ax.set_xlabel('capital') ax.set_ylabel('interest rate') ax.grid(True) ax.legend(loc='upper left') plt.show()

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.23. EXERCISES FROM OPTIMAL TAXATION

453

6.23 Exercises from Optimal Taxation


Solution to Exercise 1
import numpy as np from numpy import array from lqramsey import * # == Parameters == # beta = 1 / 1.05 rho, mg = .95, .35 A = array([[0, 0, 0, rho, mg*(1-rho)], [1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 0, 1]]) C = np.zeros((5, 1)) C[0, 0] = np.sqrt(1 - rho**2) * mg / 8 Sg = array((1, 0, 0, 0, 0)).reshape(1, 5) Sd = array((0, 0, 0, 0, 0)).reshape(1, 5) Sb = array((0, 0, 0, 0, 2.135)).reshape(1, 5) Ss = array((0, 0, 0, 0, 0)).reshape(1, 5) economy = Economy(beta=beta, Sg=Sg, Sd=Sd, Sb=Sb, Ss=Ss, discrete=False, proc=(A, C)) T = 50 path = compute_paths(T, economy) gen_fig_1(path)

# Chosen st. (Sc + Sg) * x0 = 1

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

6.23. EXERCISES FROM OPTIMAL TAXATION

454

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

CHAPTER

SEVEN

FAQS / USEFUL RESOURCES


This page collects some FAQs, useful links and commands

7.1 FAQs 7.2 How do I install Python?


See this lecture

7.3 How do I start Python?


Run one of these commands in the system terminal (i.e., terminal, command prompt, CMD, Powershell, etc., depending on your OS) python the basic Python shell (actually, dont use it, see the next command) ipython a much better Python shell ipython notebook start IPython Notebook on local machine See here for more details on running Python

7.4 How can I get help on a Python command?


See this discussion

7.5 Where do I get all the Python programs from the lectures?
Visit our public code repository https://github.com/jstac/quant-econ See this lecture for the best way to download the programs

455

7.6. WHATS GIT?

456

7.6 What's Git?


To learn about what Git is and how to use it, watch the videos or read the documentation here To install Git and grab the main repository, read the discussion here

7.7 Other Resources 7.8 IPython Magics


Common IPython commands (IPython magics) run foo.py run le foo.py pwd show present working directory ls list contents of present working directory cd dir_name change to directory dir_name cd .. go back loadpy file_name.py load file_name.py into cell

7.9 IPython Cell Magics


These are for use in the IPython notebook %%file new_file.py put at top of cell to save contents as new_file.py

7.10 Useful Links


Wakari cloud computing with IPython Notebook interface Sagemath Cloud another cloud computing environment that runs Python Quandl A Python interface to Quandl Econforge Developing open-source tools for computational economics

PDF Lectures
Lecture 1 Lecture 2 Lecture 3 T HOMAS S ARGENT AND J OHN S TACHURSKI February 5, 2014

CHAPTER

EIGHT

REFERENCES

Acknowledgements: These lectures have benetted greatly from comments and suggestion from our colleagues, students and friends. Special thanks go to Anmol Bhandari, Jeong-Hun Choi, Chase Coleman, Doc-Jin Jang, Spencer Lyon, Matthew McKay, Tomohito Okabe, Alex Olssen, Nathan Palmer and Yixiao Zhou.

457

458

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

BIBLIOGRAPHY

[Aiyagari1994] [AKR1900] [AHMS1996]

Aiyagari, S. R. (1994). Uninsured idiosyncratic risk and aggregate saving, Quarterly Journal of Economics, 109(3), 569-584. Amman, H. M., D. A. Kendrick, and J. Rust, eds. (1990). Handbook of Computational Economics. Burlington, MA: Elsevier. Anderson, E. W., L. P. Hansen, E. R. McGrattan and T. J. Sargent (1996). Mechanics of Forming and Estimating Dynamic Linear Economies, in Amman, H. M., D. A. Kendrick, and J. Rust, eds. Handbook of Computational Economics Vol 1, Elsevier Anderson, B. D. O. and J. B. Moore (2005). Optimal Filtering. Dover Publications. Barro, R. J. (1979). On the Determination of the Public Debt, Journal of Political Economy, 87 (5), 94071. Benveniste, Lawrence M., and Jose A. Scheinkman. On the differentiability of the value function in dynamic models of economics. Econometrica (1979): 727-732. Bishop, C. M. (2006), Pattern Recognition and Machine Learning, Springer. Coleman, J. W., (1990). Solving the stochastic growth model by policy-function iteration, Journal of Business and Economic Statistics, 8(1), 2729. Cryer, J. D. and K-S. Chan (2008). Time Series Analysis, 2nd edition, Springer Deaton, A. (1991). Saving and liquidity constraints, Econometrica, 59(5), 12211248. Den Haan, W. J. (2010). Comparison of solutions to the incomplete markets model with aggregate uncertainty, Journal of Economic Dynamics and Control, 34(1), 427.

[AndersonMoore2005] [Barro1979] [BenvenisteScheinkman1979]

[Bishop2006] [Coleman1990]

[CryerChan2008] [Deaton1991] [DenHaan2010]

459

BIBLIOGRAPHY

460 Du, Y., E. Lehrer and A. Pauzner 2012. Competitive economy as a ranking device over networks. Mimeo. Dudley, R. M. (2002). Real Analysis and Probability, Cambridge studies in Advanced Mathematics, No. 74 Evans, George W., and Seppo Honkapohja. Learning and expectations in macroeconomics. Princeton University Press, 2001. Friedman, M. (1956). A Theory of the Consumption Function. Princeton N. J.: Princeton University Press. Hamilton, J. D. 2005. Whats real about the business cycle? Federal Reserve Bank of St. Louis Review. JulyAugust: 43552. Hansen, L. P., and Sargent, T. J. (2000). Wanting robustness in macroeconomics. Manuscript, Department of Economics, Stanford University. Website: www. stanford. edu/sargent, 4 Hansen, L. P. and T. J. Sargent (2008). Robustness. Princeton University Press. Hansen, L. P. and T. J. Sargent (2013). Recursive Models of Dynamic Linear Economies. Princeton University Press.

[DLP2012] [Dudley2002] [EvansHonkapohja2001] [Friedman1956] [Hamilton2005] [HansenSargent2000]

[HansenSargent2008] [HansenSargent2013]

[HernandezLermaLasserre1996] Hernandez-Lerma, O., and J. B. Lasserre. (1996). Discrete Time Markov Control Processes: Basic Optimality Criteria. New York: Springer. [HopenhaynPrescott1992] Hopenhayn, Hugo A. and Edward C. Prescott (1992). Stochastic monotonicity and stationary distributions for dynamic economies, Econometrica, 60, 13871406. Hopenhayn, H. and Rogerson, R. (1993). Job turnover and policy evaluation: A general equilibrium analysis. Journal of Political Economy, 101 (5), 915938. Huggett, M. (1993). The risk-free rate in heterogeneous-agent incomplete-insurance economies, Journal of economic Dynamics and Control, 17(5), 953969. Janich, K. (1994). Linear Algebra. Springer Verlag Judd, K. L. (1998). Numerical Methods in Economics. Cambridge: MIT Press. Kamihigashi, T. (2012). Elementary Results on Solutions to the Bellman Equation of Dynamic Programming: Existence, Uniqueness, and Convergence, Kobe University RIEB DP201231. Kuhn, M. (2013). Recursive equilibria in an Aiyagari-style economy with permanent income shocks, International Economic Review, in press.

[HopenhaynRogerson1993]

[Huggett1993]

[Janich1994] [Judd1998] [Kamihigashi2012]

[Kuhn2013]

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

BIBLIOGRAPHY

461 Lasota, A., and M. C. Mackey (1994). Chaos, Fractals and Noise: Stochastic Aspects of Dynamics. New York: Springer. Lucas Jr, Robert E., and Edward C. Prescott. Investment under uncertainty. Econometrica: Journal of the Econometric Society (1971): 659-681. Lucas, R. E. and N. Stokey (1983). Optimal Fiscal and Monetary Policy in an Economy without Capital, Journal of Monetary Economics 12 (3), 5559. Mas-Colell, Andreu, Michael Dennis Whinston, and Jerry R. Green. Microeconomic theory. Vol. 1. New York: Oxford university press, 1995. Martins-da-Rocha, V.F., Vailakis, Y., 2010, Existence and uniqueness of a xed point for local contractions, Econometrica, 78, 11271141. Marcet, Albert, and Thomas J. Sargent. Convergence of leastsquares learning in environments with hidden state variables and private information. The Journal of Political Economy (1989): 1306-1322. McCall, J. J. (1970). Economics of Information and Job Search, Quarterly Journal of Economics, 84(1), 113-126. Meyn, S. P., and R. L. Tweedie (2009). Markov Chains and Stochastic Stability. Cambridge UP Miranda, M., and P. L. Fackler. (2002). Applied Computational Economics and Finance. Cambridge: MIT Press. Modigliani, F. and R. Brumberg (1954). Utility analysis and the consumption function: An interpretation of cross-section data. In: Kurihara, K.K (ed.): Post-Keynesian Economics Neal, D. (1999). The Complexity of Job Mobility among Young Men, Journal of Labor Economics, 17(2), 237-261. Popper, K. R. (1992). The logic of scientic discovery. London: Routledge, c1992, 1 Rabault, G. (2002). When do borrowing constraints bind? Some new results on the income uctuation problem, Journal of Economic Dynamics and Control, 26(2), 217245. Ramsey, F. P. (1927). A Contribution to the theory of taxation, Economic Journal, 37 (145), 4761. Reiter, M. (2008). Solving heterogeneous-agent models by projection and perturbation, Working paper. Sargent, Thomas J. (1987). Macroeconomic Theory, 2nd edition, Academic Press February 5, 2014

[LasotaMackey1994] [LucasPrescott1971]

[LucasStokey1983]

[MCWG1995]

[MV2010]

[MarcetSargent1989]

[McCall1970] [MeynTweedie2009] [MirandaFackler2002] [ModiglianiBrumberg1954]

[Neal1999] [Popper1992] [Rabault2002]

[Ramsey1927] [Reiter2008] [Sargent1987]

T HOMAS S ARGENT AND J OHN S TACHURSKI

BIBLIOGRAPHY

462 Schechtman, J. and V. L. S. Escudero (1977). Some results on an income uctuation problem, Journal of Economic Theory, 16, 151 166. Schelling, T. C. (1969). Models of segregation, American Economic Review, 1969, 59 (2), 488493. Shiryaev, A. A. (1995). Probability, 2nd edition, Springer Stokey, N. and R. E. Lucas with E. C. Prescott (1989). Recursive Methods in Economic Dynamics, Harvard UP Sundaram, R. K. (1996). A First Course in Optimization Theory, Cambridge UP Tauchen, G. 1986. Finite state markov-chain approximations to univariate and vector autoregressions. Economics Letters, 20 (2), 177181. Uhlig, H. (2001). A toolkit for analysing nonlinear dynamic stochastic models easily. In Computational Methods for the Study of Dynamic Economies. R. Marimon and A. Scott, eds. Oxford: Oxford University Press, pp. 3062.

[SchechtmanEscudero1977]

[Schelling1969] [Shiryaev1995] [StokeyLucas1989] [Sundaram1996] [Tauchen1986]

[Uhlig2001]

T HOMAS S ARGENT AND J OHN S TACHURSKI

February 5, 2014

You might also like