You are on page 1of 10

flowingdata.

com

The Baseline

by Nathan Yau Nov. 26, 2013 5 min read original

The left lane on the f r ee way, commonly known a s the fa st lane, is sometime s mistaken a s the slowe st lane on the planet. It's e specially weird when the r e a r en't many ca rs on the r oa d, and you 'r e dr iving in the fa st lane only to find yourself slowe d down by the pe rson in f r ont of you who move s at 60 % of the spee d limit.

The decent thing to do is for the slow pe rson to switch to the r ight lane so that you can pa ss. Bu t instea d, he ca rr ie s on with his slowne ss, so afte r a mile or two, you switch lane s to pa ss. You spee d u p, and then all of a sudden he spee ds way u p, only to make you the slow one.

I bet I'm the slow one in f r ont mor e than I am the annoye d one in the back.

Some people a r e just je r ks, bu t speaking f r om expe r ience, I think this happens beca use your ba seline for the spee d limit is compoa se d of inanimate objects, su ch a s the r oa d, tr ee s, and signs. It feels like you 'r e going fa st u ntil you see someone dr ive m u ch fa ste r behind you . The ba seline for your spee d move s u p, and your c urr ent spee d suddenly feels slow.

I've notice d this ba seline shift a lot r ecently with a baby in the house. I use d to sleep a r ou nd 1am, and now 9:30 pm seem s late; a television with the volu me at 15 seeme d just r ight, and now the ba seline is at 9; and an eve r yday e rr and like g r abbing donu ts mor phe d into an a dventur e.

Nothing change d physically. The clock still ticks at the same spee d, the television volu me ha sn't gone haywir e, and the donu t shop is in the same place at it's always been. Bu t, eve r ything looks and feels diffe r ent.

It's kind of like the cla ssic Powe rs of Ten clip that sta r ts tiny and zoom s ou t fa r the r and fa r the r . Eve r ything looks significant when you look at it f r om the r ight angle.

The Known Unive rse is also a good one, a s is The Simp sons pa r ody.

Althou gh the sour ce data is time se r ie s in the example s that follow, this is applicable to othe r data type s.When you look at data, it's impor tant to conside r this ba seline this imagina r y place or point you want to compa r e to. Of course, the r ight answe r is diffe r ent for va r ious data sets, with va r iable context, bu t let's look at some p r actical example s in R.

You don't have R on your comp u te r yet?You can just follow along loosely, or you can downloa d and install R, downloa d the sour ce linke d above, and follow the code snippets.

So first you have to loa d the data, which is in CSV for mat. U se

read.csv()

to br ing it in. We'r e

going to look at the cost of ga s, egg s, and the Consu me r Pr ice Index, a s p u blishe d by the Bur ea u of Labor Statistic s.

# Load the data. cpi <- read.csv("data/cpi-monthly-us.csv", stringsAsFactors=FALSE) eggs <- read.csv("data/egg-prices-monthly.csv", stringsAsFactors=FALSE) gas <- read.csv("data/gas-prices-monthly.csv", stringsAsFactors=FALSE)

Say you 'r e inte r e ste d in how ga s p r ice s have change d ove r time. A time se r ie s cha r t is the most str aightforwa rd thing to do.

# Regular time series for gas price par(cex.axis=0.7) gas.ts <- ts(gas$Value, start=c(1976, 1), frequency=12) plot(gas.ts, xlab="", ylab="", main="Dollars per gallon", las=1, bty="n")

As you might expect, the p r ice r ise s with a dip in the 2000s. Your concept of the c urr ent dolla r and histor ical p r ice s make u p your ba seline.

Maybe you only ca r e abou t the monthly pe r centage change s thou gh mor e than you do abou t the actu al p r ice. You want to shift the ba seline to ze r o and look at pe r centage s. The code below take s the ga s p r ice s, except for the first valu e (

curr

), then the p r ice s except for the la st valu e (

prev

),

and then su btr acts and divide s. If the change is negative the p r ice dr oppe d f r om the p r evious month a ba r is color e d g r een. Ba rs a r e g r ay othe rwise.

# Monthly change curr <- gas$Value[-1] prev <- gas$Value[1:(length(gas$Value)-1)] monChange <- 100 * round( (curr-prev) / prev, 2 ) barCols <- sapply(monChange, function(x) { if (x < 0) { return("#2cbd25") } else { return("gray") } }) barplot(monChange, border=NA, space=0, las=1, col=barCols, main="% change, monthly")

This is noisy thou gh. Maybe a yea r -ove r -yea r change wou ld be mor e usef u l.

curr <- gas$Value[-(1:12)] prev <- gas$Value[1:(length(gas$Value)-12)] annChange <- 100 * round( (curr-prev) / prev, 2 ) barCols <- sapply(annChange, function(x) { if (x < 0) { return("#2cbd25") } else { return("gray") } }) barplot(annChange, border=NA, space=0, las=1, col=barCols, main="% change, annual")

The magnitude of dr op s in p r ice a r e mor e visible this way.

Maybe thou gh your ba seline is the c urr ent ga s p r ice, and you want to know how all pa st p r ice s compa r e to now. Take the most r ecent p r ice and su btr act f r om all othe rs.

curr <- gas$Value[length(gas$Value)] gasDiff <- gas$Value - curr barCols.diff <- sapply(gasDiff, function(x) { if (x < 0) { return("gray") } else { return("black") } } ) barplot(gasDiff, border=NA, space=0, las=1, col=barCols.diff, main="Dollar differenc e from September 2013")

Black ba rs, or a positive diffe r ence, show when ga s wa s mor e expensive r elative to the p r e sent.

The r e's a p r oblem thou gh. When you compa r e histor ical p r ice s, you have to accou nt for inflation. The ba seline is not only how m u ch ga s costs now, bu t how m u ch a dolla r is wor th. A dolla r today isn't wor th the same a s a dolla r thir ty yea rs ago.

This is whe r e the Consu me r Pr ice Index come s into play. It r ep r e sents how m u ch households have to pay for goods and se r vice s. Divide the CPI today with the CPI dur ing a diffe r ent time and you get a m u ltiplication factor to e stimate the a djuste d p r ice pe r gallon of ga s. In othe r words, you want to know how m u ch a s gallon of ga s dur ing a pa st yea r wou ld cost in today's dolla rs.

The code below p r ovide s a djuste d cost.

# Adjust gas price for inflation gas.cpi.merge <- merge(gas, cpi, by=c("Year", "Period")) gas.cpi <- gas.cpi.merge[,-c(3,5)] colnames(gas.cpi) <- c("year", "month", "gasprice.unadj", "cpi") currCPI <- gas.cpi[dim(gas.cpi)[1], "cpi"] gas.cpi$cpiFactor <- currCPI / gas.cpi$cpi gas.cpi$gasprice.adj <- gas.cpi$gasprice.unadj * gas.cpi$cpiFactor

Now you can make the same g r aph s a s befor e, bu t with a djuste d p r ice s.

curr <- gas.cpi$gasprice.adj[dim(gas.cpi)[1]] gasDiff.adj <- gas.cpi$gasprice.adj - curr barCols.diff.adj <- sapply(gasDiff.adj, function(x) { if (x < 0) { return("gray") } else { return("black") } } ) barplot(gasDiff.adj, border=NA, space=0, las=1, col=barCols.diff.adj, main="Adjusted dollar difference from September 2013")

The p r ice pe r gallon of ga s is r elatively highe r the se days, bu t now you see something else in p r evious deca de s. Ga s wa s r elatively mor e expensive for a shor t while. Pr ice ha sn't been just a stea dy inc r ea se.

Let's tr y the same thing with the annu al pe r centage change.

# Adjusted annual change curr <- gas.cpi$gasprice.adj[-(1:12)] prev <- gas.cpi$gasprice.adj[1:(length(gas.cpi$gasprice.adj)-12)] annChange.adj <- 100 * round( (curr-prev) / prev, 2 ) barCols.adj <- sapply(annChange.adj, function(x) { if (x < 0) { return("#2cbd25") } else { return("gray") } }) barplot(annChange.adj, border=NA, space=0, las=1, col=barCols.adj, main="% change, a nnual adjusted")

Again, you see a diffe r ent patte r n dur ing the 1980s, beca use the ba seline is p r ope r ly shifte d.

Finally, compa r e the str aightforwa rd time se r ie s cha r t for a djuste d and u na djuste d dolla rs.

# Adjusted time series par(mfrow=c(2,1), mar=c(4,3,2,2)) gas.ts.adj <- ts(gas.cpi$gasprice.adj, start=c(1976, 1), frequency=12) plot(gas.ts, xlab="", ylab="", main="Dollars per gallon, unadjusted", las=1, bty="n" ) plot(gas.ts.adj, xlab="", ylab="", main="Dollars per gallon, adjusted", las=1, bty=" n")

Inflation a djustment isn't the only way to ga u ge the magnitude of change thou gh. You just nee d something to compa r e against. The p r ice of ga s inc r ea se d. Did eve r ything else inc r ea se in cost? Tr y a compa r ison of ga s p r ice and the p r ice of a dozen of egg s.

# Gas versus eggs merge data gas.eggs.merge <- merge(gas, eggs, by=c("Year", "Period")) gas.eggs <- gas.eggs.merge[,-c(3,5)] colnames(gas.eggs) <- c("year", "month", "gas", "eggs") gas.ts <- ts(gas.eggs$gas, start=c(1980, 1), frequency=12) eggs.ts <- ts(gas.eggs$eggs, start=c(1980, 1), frequency=12) # Plot it par(bty="n", las=1) ts.plot(gas.ts, eggs.ts, col=c("dark gray", "black"), ylim=c(0, 4), main="Price for dozen of eggs vs. gallon of regular gas, unadjusted", ylab="Dollars")

text(1980, 1.6, "Gas", pos=4, cex=0.7, col="dark gray") text(1980, 0.5, "Eggs", pos=4, cex=0.7, col="black")

This give s you a bette r sense of the magnitude of ga s p r ice change s than if you we r e to look at it withou t any othe r context.

He r e's one mor e look, bu t this time a s a r atio of egg p r ice to ga s p r ice.

# Eggs to gas ratio eggs.gas.ratio <- ts(gas.eggs$eggs/gas.eggs$gas, start=c(1980, 1), frequency=12) par(cex.axis=0.7) plot(eggs.gas.ratio, bty="n", las=1, ylab="", main="Price of eggs to gas") lines(c(1970, 2015), c(1,1), lty=2, lwd=0.5, col="gray") text(1979, 1.11, "Eggs cost more", cex=0.6, pos=4, offset=0) text(1979, 0.89, "Gas costs more", cex=0.6, pos=4, offset=0)

The 1.0 ba seline make s it ea sy to spot when ga s wa s mor e expensive and vice ve rsa.

Wrapping up
Whethe r you wor k with tempor al data, categor ical, r anking s, etc, always conside r your ba seline. Doe s it make sense? Ar e your compa r isons valid? The wr ong ba seline can lea d to exagge r ate d r e su lts or u nde rr ep r e sente d one s, so you m ust be ca r ef u l. And, if we haven't even tou che d on u nce r tainty yet.

Original URL:

http://flo wingdata.com/2013 /11 /26 /the-bas eline/

You might also like