Driveline Mechanics: An SB Nation Community

Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Around SBN: SEC Basketball at the Half

JustVORP versus Just VORP: The (Sort of) Big Rundown

Running_you_over_prank_mediumAs one of Baseball Prospectus' flagship stats, VORP (Value Over Replacement Player) is probably best known to the general public as  some nerdly invention that infuriates crusty old deans sports journalists like Murray Chass and Jon  Heyman. "The" internet responded in predictable (if predictably hilarious) fashion, in blog posts undoubtedly typed with 'faces born of contempt.'

I'm not writing to continue that tired non-debate or to make fun of those crochety ol' "mainstream media" types. Rather, I'm going to review some legitimate criticisms of VORP for position players (there are other issues for pitchers) and to see what sort of practical weight the problems of VORP might have, that is, does it matter beyond just a "few runs" that internet sabermetricians are carping about?

Star-divide

In an earlier column, I discussed and utlized Justin Inaz's Total Value spreadsheet for 2008, a very carefully constructed "überstat" for evaluating individual players. There are many cool things about the spreadsheet that I discuss in the earlier article. You can also find it referenced a fair bit elsewhere on the 'Net.

One prominent use of Justin's numbers is Sky Kalkman's series at Beyond the Box Score reviewing the best players of the 2008 season at each position. What's even cooler is that Justin has allowed Sky access to his spreadsheet. Using the data, Sky put together a "JustVORP" spreadsheet. Basically, Sky just leaves Justin's defensive stuff (well-done in its own right) out, and gives a list of each players positionally-adjusted runs above replacement level. Thus, it neatly parallels Baseball Prospectus's VORP (created by Keith Woolner), which also does not include defense, but gives the positionally-adjusted offensive runs above replacement for individual players.

I am certainly not dismissing defense (it's the 'New Moneyball,' after all). However, since  there is so much disagreement about how defense should be evaluated (whether or not stats should be used at all, and if so, which one), that using stats like VORP (and JustVORP), which give everything "but" defense, allows the user to then figure in the defensive adjustments in his or her own fashion when evaluating a player.

Just to make this clear again: I am not saying that we should forget about defensive stats. I am simply leaving them aside for the purposes of this column -- to see how much VORP does or doesn't go wrong in evaluating a player's relative offensive contributions. Along the same lines, this shouldn't be taken as an endorsement of postionally adjusted offensive stats taken on their own. I am very much against a "this player doesn't hit well enough to be a corner outfielder" sort of thinking which doesn't taken defense into account -- see Dave Cameron's excellent write-up on that topic. I'm using position-adjusted offensive stats here just to see where VORP goes wrong, considered on its own terms.

So what are the differences between JustVORP and VORP? Why prefer one to the other? Sky summarizes it well in his JustVORP post, which you should read, but I will summarize some of  the issues briefly:

  • BP's VORP uses Marginal Lineup Value as its base run estimator. As has been shown in various places, MLV undervalues walks by half. That's right, Baseball Prospectus, the avant-garde of the OBP revolution (well, maybe 40-50 years after Branch Rickey), have the value of walks wrong in one of their flagship stats. Justin's stats, on the other, hand, use BaseRuns (discussed in many helpful places -- see Colin Wyers' "Batman" series listed at the end of this article for one) to establish the correct linear weight of each event. In short, BaseRuns is a dynamic run estimator that allows for the changing run expectancy of each baseball event in different run environments -- even the most extreme, where even linear weights break down (as for Runs Created and MLV, one might say they were broken to begin with...).
  • JustVORP adjusts for the superiority of AL pitching, VORP does not.
  • VORP defines replacement level in relation to the league offensive average at each position for the season. This has the weird result of making DHs seem more valuable than 1Bs, because their averages aren't hitting well. Yet we know that DH is the easiest position to "replace," and is thus (all things being equal) the least valuable. Just because teams aren't using their DH spot doesn't change that fact. That's just the most obvious example -- in some seasons, the average CF hits better than the average LF (think of teh 1950s). But that certainly doesn't mean that the LFers are harder to find and replace, because all CFers can play LF, but not vice-versa. VORP (in some eras) might lead one to think otherwise. JustVORP avoids this by using Tom Tango's positional adjustments (prorated for playing time, of course): Catcher +12.5, SS +7.5, 2B/3B/CF +2.5, LF/RF -7.5, 1B -12.5, DH -17.5.

But what is the practical effect of all this? Is this just sabermetric nitpicking? In one discussion on the Book Blog, comparing VORP with BP's RARP (runs above replacemente using Equivalent Runs -- a better run estimator anyway that approximates the correct linear weights), Colin Wyers found that RARP itself revealed VORP to be undervaluing not only players who walked a lot, but also people who hit a higher than usual proportion of doubles. The average VORP "error" along these lines for regulars was about 5.8 runs, and about 6.6 for the doubles hitters. That's a pretty big error, once you realize that that's more than half a win -- and teams are pahing between four and five milllion dollars for each marginal win in 2009. And that's just addressing the problem in terms of the linear weights.

What does this come to in concrete terms? For the sake of discussion, let's assume that JustVORP is on the right track in terms of run estimation, positional adjustment, etc. How big of a difference does just using plain ol' VORP make. Let's take the top ten position players in baseball according to VORP (rounded to the nearest run for reasons having to do with my less-than-brilliant abilities in importing data into MYSQL):


Name 2008 VORP
Albert Pujols 99
Hanley Ramirez 79
Chipper Jones 76
Lance Berkman 73
David Wright 66
Chase Utley 63
Alex Rodriguez 63
Jose Reyes 63
Matt Holliday 62
Dustin Pedroia 60

 

Not too many surprises there. But now, let's compare BP's VORP numbers with their JustVORP numbers for the same players


Name VORP JustVORP Difference
Albert Pujols 99 78 +21
Hanley Ramirez 79 76 +3
Chipper Jones 76 68 +8
Lance Berkman 73 57
+16
David Wright 66 65
+1
Chase Utley 64 56
+8
Alex Rodriguez 63 62
+1
Jose Reyes 63 59
+4
Matt Holliday 62 46
+16
Dustin Pedroia 60 53
+7

 

Not all of those are minor differences. Albert Pujols is almost certainly the best player in professional baseball at the moment, but overrating him by 20 runs -- well, if a team was using VORP to evaluate what they should pay a player, that's around a $10 million difference. Berkman and Holliday's ratings are almost as problematic. Note that all three of those players are from the "easy" end of the defensive spectrum, as well as being from the NL.

I do not want to lose myself in statistical detail that I'm not qualified to go into anyway. The best thing to do would be to go through multiple years and check the standard deviations, the variance, the error, and (as Colin did) the average error. What I will do is easier and more straightforward, if not as definitive. To check out the "positional biases," I went position-by-position (primary position as defined by VORP for players who played multiple positoins), and took all the players with at least 500 plate appearances (400 for catchers and DHs). Then I averaged the difference between VORP and JustVORP by player. Again, this is a measurement of the average over/underrating of each position. It is not the measure of the variance. It is not a multi-year study, and the sample sizes are low. So these only allow us, at best, to from some preliminary hypotheses about VORP's postional biases.


Position Difference
DH +10.8
1B +6.8
2B -0.8
3B -2.7
SS +2.1
LF +2.0
RF +0.1
CF -1.1
C -4.7

 

There it is. While the sample size is too small and the method is too crude in our little analysis to say for sure, so far, the hypothesis that VORP overvalues 1Bs and DHs -- the easiest to replace part of the talent pool, and undervalues everyone else --particularly catchers and third basemen -- is borne out. While one or two runs isn't that bad, 7 runs per first basemen is.

(To repeat: we're leaving defense aside for the sake of getting at the offensive and positional adjustment difference. A complete player evaluation must take defense into account, of course.)

Let's take some concrete cases. Ryan Howard, for example, just got a contract extension from the Phillies that will buy out his remaining free agency years. I won't give a full analysis here. Matthew Carruth argues that Howard probably got overpaid. That almost never happens when a player gets bought out pre-free agency, but it did in this case due to the (avoided) arbitration hearings giving too much home runs and RBI and not considering those things that VORP is supposed to look at, such as positional adjustments. But the terms of the signing don't concern us directly here. If you were a GM, and just used VORP from BP (assume defense to be equal), what would your evaluation of Howard's 2008 contribution over a replacement level first baseman be? 36 runs, or about 3.5 wins. Assuming $5 million her marginal win, that implies a salary of $17.5 million a year. JustVORP, on the other hand, says that a Howard was 25 runs above a replacement first baseman. That's about a win, or $5 million, difference. That's a lot of money, even to a baseball team. And in this case, I think that VORP is probalby overvaluing Howard for one or two reasons: 1) VORP proceeds from positional averages, not accounting for the fact that first base is more easily replacable, even with players who normally player other positions, because it is the easiest position to play on the field; and 2) VORP does not adjust for the easier pitching in the National League.

A very different case is Akinori Iwamura, the second baseman of the Tampa Bay Rays. VORP values him at 17 runs above replacement, while JustVORP has him at 27 RAR. Again, that's about a win, or $5 million. The problem again is probably  positional adjustment and league difficulty adjustment, and proportionally, for player who clearly isn't a star like Iwamura, that's big difference -- the difference (assuming average defense) between a 1.7 WAR, below average, stopgap starter who might need to have a replacement available coming up through the minors, and a 2.7 WAR slightly above average player (average being around 2 WAR) who is easily holding down his spot.

The point is clear. Tango, Wyers, and others have pointed out that the runestimator at the heart of VORP is flawed in its treatment of walks and also doubles. When considering positional adjustments and the difference in difficulty between leagues, even in the few cases we looked at here (the top ten players by VORP, as well as Howard and Iwamura) as well as the general "error" by position, it's clear that VORP misvalues more than a few players, and by more than a little. Sabermetrics is supposed to help teams get better evaluations of players. If the assumptions on which JustVORP (and similar "überstats" such as FanGraphs' WAR -- that comparison will be for another column) are true, then in comparison, VORP fails on that score.

Oh, and if you're a sabermetrically-oriented blogger, and don't want to become an out-of-touch old man, take a hint from Murray Chass and Jon Heyman. Stop using VORP. Don't worry, though. BaseRuns and wOBA will bug them even more.

[Brief Update: I have written a supplementary post comparing the VORP and JustVORP numbers for the 2008 Royals over at Royals Review. I don't want to overdo the Royals-related content here, but if you want another concrete comparison of how big the difference can be, there you have it...]

2 recs  |  Comment 24 comments |

Story-email Email Printer Print

Comments

Display:

I wonder if BP is going to try to make VORP catch up with the rest of sabermetrics

Clearly VORP is behind the times and badly in need of a few different types of upgrades. But it doesn’t appear that BP is particularly interested in innovating in the field of statistical analysis. They still undertake statistical analysis, but only with existing tools. And mostly they just use their own tools, and are very hesitant to pick up innovations from others in the field. So they aren’t updating and they aren’t innovating. It’s pretty sad considering that site was once the gold standard in sabermetrics.

The immoderate moderator

by NYRoyal on Feb 17, 2009 7:02 AM PST reply actions   0 recs

They did SuperVORP

But I don’t know enough about it to comment on the changes.

Webmaster of Driveline Mechanics
http://www.drivelinemechanics.com - An Unconventional Look at Scouting

by Kyle Boddy on Feb 17, 2009 9:58 AM PST up reply actions   0 recs

SuperVORP is just VORP+FRAA. None of the underlying fundamentals of VORP have changed.

by cwyers on Feb 18, 2009 9:40 AM PST up reply actions   0 recs

hey, folks, it's our first celebrity

(well, from my point of view, and at least in one of my posts)

Nice to see you here, Colin.

Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.

by devil_fingers on Feb 18, 2009 10:32 AM PST up reply actions   0 recs

oh yeah

Adding FRAA to VORP.. .THAT should make it better (winky face)

Colin, am I right when I assert that EqA/R doesn’t have the same problem with walks that VORP does? Are the weight pretty close to right? Since I made that remark, I’ve read some older stuff on both sides of the issue. I don’t want to confuse things any more with my stupidity…

Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.

by devil_fingers on Feb 18, 2009 10:35 AM PST up reply actions   0 recs

I didn't realize I was a celebrity.

I’m sure my wife would be surprised to hear it.

EqA is pretty close to right when it comes to the value of the walk, HR, etc. Most of the issues I have with EqA/EqR is that it’s simply inconvenient compared to a more plain-vanilla LWTS without actually being any better. There’s nothing WRONG with it, per se.

by cwyers on Feb 18, 2009 11:20 AM PST up reply actions   0 recs

Yeah, that's what I thought about EqA -- too complicated

You’re a celebrity to me because you got me into MYSQL, which is a much more efficient way for me to waste time… That, and I like your columns.

At some point, I want to pick your brains about your BaseRuns version of FIP, which I have messed with a bit, but I want to make sure I’m doingn it right.

Also, it’s not worth starting a new thread for: how bad are the B-R park factors that are in the BDB database? Are they useful at all, if I wanted to park-adjust some of by wOBA/LWTS figures? Are they the multi-year ones, or the single year?

Anyway, don’t mean to pick your brains too much here, just had some questions and no good place to ask, and maybe others can benefit if they read it here. I’ll hold off in begging for a Retrosheet tutorial…

Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.

by devil_fingers on Feb 18, 2009 12:04 PM PST up reply actions   0 recs

If you (or anyone else) has some sort of staty question you’d like answered (particularly when it concerns something I wrote), feel free to drop me a line at:

pontifexexmachina at hotmail.com

I try to keep up with that sort of e-mails, but sometimes I tend to fall behind on my correspondance.

The BDB park factors – I hope those are the three-year factors, but I don’t know. If they’re the three year, they’re probably better than no park adjustment. For applying park factors to LWTS, I prefer the linear method I outlined in my player value series over at THT. Essentially, take the park factor and divide by 100 to get the actual park factor.

Then take the average R/PA (normally about .12) and park adjust that – so if your park factor is 1.09, then you get:

1.09*.12=.1308

Then take:

.12-.1308=-0.0108

Then, for a hitter with 650 PAs, you’d simply take:

-0.0108*650=-7.02

And then you add that to the LWTS total, so if you had a player who was, say, +20 according to LWTS they become +13.

(A case could be made to apply the park factor per out instead of per PA. It’s something I haven’t really considered yet.)

A better set of park factors to use is Patriot’s historical park factors. He has 2007 and 2008 park factors available elsewhere on his site as well.

As for park adjusting wOBA, I haven’t really put much thought into it. I normally just divide by the park factor, but I’m not sure that’s correct. Maybe someone like terpsfan or Tango can weigh in on it (I know terps has looked at it a lot more than I have).

by cwyers on Feb 19, 2009 9:45 AM PST up reply actions   0 recs

Well that sounds awful.

Webmaster of Driveline Mechanics
http://www.drivelinemechanics.com - An Unconventional Look at Scouting

by Kyle Boddy on Feb 18, 2009 11:30 AM PST up reply actions   0 recs

"Hey, this hamburger meat is a bit old, but we can still make a burger out of it...

… you know what would fix it? Those buns that have been sitting out in that really humid room."

Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.

by devil_fingers on Feb 18, 2009 12:01 PM PST up reply actions   0 recs

But I agree

That they are moving backwards. Will Carroll is trying his best to keep it fresh, updated, and Web 2.0-ish, but the truth is that the best contributors have moved on and Nate Silver has his hands in so many other things that I highly doubt he is innovating very much.

All Fangraphs did was aggregate all the data and formulas into one place, something that Tango, SABR Matt, and many other prominent people never got around to doing. As a result, they are becoming the gold standard for sabermetrics.

Webmaster of Driveline Mechanics
http://www.drivelinemechanics.com - An Unconventional Look at Scouting

by Kyle Boddy on Feb 17, 2009 9:59 AM PST up reply actions   0 recs

And despite BP's protestations that it can't be done...

they are, in fact, doing it all free of charge.

VORP isn’t the only thing at BP that’s messed up – their fielding statistic, FRAA, is a joke. EqA, like MLVR, undervalues walks and doubles (I think MLVR is based on EqA). Anyone who’s numbers tell you that the Twins trade with the Rays (you know, the Delmon Young for Garza and Bartlett one) was a draw is someone who has made a severe miscalculation.

I think that BP really runs the risk of becoming irrelevant if they don’t completely change things around by 2010. They still do good things – Carroll is excellent on injuries, PECOTA is excellent (but not much better than CHONE), and some of their interviews with players, prospects, and others inside the game are fun to read. But the foundation of any performance analysis organization is the numbers, and BP’s is crumbling.

by BraveBronco0121 on Feb 17, 2009 5:52 PM PST up reply actions   0 recs

actually, if you read the the linked articles from Tango and Wyers (on VORP) above

MLV (the basis of VORP) is completely separate from EqA/R. EqA/R is basicallly a proxy for linear weights, and, in fact, they do have (not on the team audit pages), RARP, which is basedon EqA (or EqR). I’ve done the MYSQl that Colin provides somewhere tha tshows the discrepancy. I think that would solve the walks problem, at least. You can see RARP here , for example, although it isn’t nearly as prominent as VORP. It doesn’t match JustVORP or WAR exactly, of course, because they still work from league averages by position a la VORP (I believe), but there are still clear discrepancies between the VORP numbers for 20o8 and the RARP numbers. Even between VORP and RARP, for example, there is an ~11 run difference on Pujols, 14 on Berkman, etc. So already they have a much better metric on hand, it’s just hard to find. They should put those numbers in the team audit pages.

I don’t want to shut this discussion down, and I agree with most of what you all are saying, but I didn’t want to totally rip on BP, just to point out the sort of stats to which we should be referring, and that, in fact, the “differences” (incorrect valuations) are not just trivial, but significant.

Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.

by devil_fingers on Feb 17, 2009 6:32 PM PST up reply actions   0 recs

Agreed

It’s worth it for PECOTA alone (besides being better, it is laid out extremely well on the site unlike CHONE), but the content on the site is sorely lacking with Woolner et. al. gone. Transaction Report with Karhl is good too, though, and Goldstein’s articles are solid but also eminently replaceable by a subscription to Baseball America.

Fangraphs and our own Beyond the Box Score continue to gain in the market for sabermetrics, IMO.

Webmaster of Driveline Mechanics
http://www.drivelinemechanics.com - An Unconventional Look at Scouting

by Kyle Boddy on Feb 17, 2009 8:45 PM PST up reply actions   0 recs

PECOTA's better overall

but CHONE and ZiPS have definitely narrowed the gap

Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.

by devil_fingers on Feb 17, 2009 9:06 PM PST up reply actions   0 recs

I agree

But PECOTA is much cooler to look at with the player cards than CHONE. That’s worth something in my eyes!

Webmaster of Driveline Mechanics
http://www.drivelinemechanics.com - An Unconventional Look at Scouting

by Kyle Boddy on Feb 17, 2009 9:43 PM PST up reply actions   0 recs

CHONE has improved that significantly this year

The “expanded” page for each player with a percentile breakdown, as well as run and win values makes it as valuable and deep as PECOTA, if not moreso (due to a meaningful description of value above replacement). CHONE’s expanded pages just aren’t quite as pretty as PECOTA’s player cards.

The immoderate moderator

by NYRoyal on Feb 17, 2009 10:40 PM PST up reply actions   0 recs

Link?

I am using what I see on Fangraphs. I guess I should find the actual home page!

Webmaster of Driveline Mechanics
http://www.drivelinemechanics.com - An Unconventional Look at Scouting

by Kyle Boddy on Feb 18, 2009 12:10 AM PST up reply actions   0 recs

right 'chere

http://www.baseballprojection.com/

Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.

by devil_fingers on Feb 18, 2009 6:07 AM PST up reply actions   0 recs

And wasn't Nate supposed to have those player cards out Monday?

yeah, I know, he’s busy calculating, what exactly?

Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.

by devil_fingers on Feb 18, 2009 6:08 AM PST up reply actions   0 recs

(Nate, you're still the Man. I'm avilable if you just want to chat about, um, whatever)

Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.

by devil_fingers on Feb 18, 2009 6:08 AM PST up reply actions   0 recs

how does FanGraphs pay its authors

The ads are really recent, and even that can’t generate that much money, can it?

NOne of my business, I’m not trying to pry, but I really wonder.

Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.

by devil_fingers on Feb 20, 2009 8:21 PM PST up reply actions   0 recs

For more concrete evidence

you can check my supplementary post at Royals Review on the Royals 2008 position players. I don’t want to overdo the Royals here since they are “my” team, but you might find it helpful.

Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.

by devil_fingers on Feb 17, 2009 6:33 PM PST reply actions   0 recs

Comments For This Post Are Closed


User Tools

Welcome to Driveline Mechanics!
Start posting on Driveline Mechanics »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Pitching Mechanics Retrospective
Small
Why the Blue Jays are a pitching mechanics train wreck.
Small
Brandon League's unusual arm action
Small
Summary: Aroldis Chapman's mechanics.
Small
The Blue Jays: A trainwreck of pitching mechanics.
Small
Re: Analysis of Justin Duchscherer's pitching mechanics.
Website_pic_small
Complete Hitting
Website_pic_small
Complete Hitting
Website_pic_small
CCC's to success in Baseball: Commitment to becoming a Complete Player = Confidence in your abilities.
Website_pic_small
The Eight Essential Pieces to a Smooth Swing

+ New FanPost All FanPosts >


Managers

Me_-_baseball_small Kyle Boddy

Editors

Photo_29_small hazel

Newavatar_small devil_fingers

1753738656_110919ebe9_o_small vivaelpujols