JustVORP versus Just VORP: The (Sort of) Big Rundown
As one of Baseball Prospectus' flagship stats, VORP (Value Over Replacement Player) is probably best known to the general public as some nerdly invention that infuriates crusty old deans sports journalists like Murray Chass and Jon Heyman. "The" internet responded in predictable (if predictably hilarious) fashion, in blog posts undoubtedly typed with 'faces born of contempt.'
I'm not writing to continue that tired non-debate or to make fun of those crochety ol' "mainstream media" types. Rather, I'm going to review some legitimate criticisms of VORP for position players (there are other issues for pitchers) and to see what sort of practical weight the problems of VORP might have, that is, does it matter beyond just a "few runs" that internet sabermetricians are carping about?
In an earlier column, I discussed and utlized Justin Inaz's Total Value spreadsheet for 2008, a very carefully constructed "überstat" for evaluating individual players. There are many cool things about the spreadsheet that I discuss in the earlier article. You can also find it referenced a fair bit elsewhere on the 'Net.
One prominent use of Justin's numbers is Sky Kalkman's series at Beyond the Box Score reviewing the best players of the 2008 season at each position. What's even cooler is that Justin has allowed Sky access to his spreadsheet. Using the data, Sky put together a "JustVORP" spreadsheet. Basically, Sky just leaves Justin's defensive stuff (well-done in its own right) out, and gives a list of each players positionally-adjusted runs above replacement level. Thus, it neatly parallels Baseball Prospectus's VORP (created by Keith Woolner), which also does not include defense, but gives the positionally-adjusted offensive runs above replacement for individual players.
I am certainly not dismissing defense (it's the 'New Moneyball,' after all). However, since there is so much disagreement about how defense should be evaluated (whether or not stats should be used at all, and if so, which one), that using stats like VORP (and JustVORP), which give everything "but" defense, allows the user to then figure in the defensive adjustments in his or her own fashion when evaluating a player.
Just to make this clear again: I am not saying that we should forget about defensive stats. I am simply leaving them aside for the purposes of this column -- to see how much VORP does or doesn't go wrong in evaluating a player's relative offensive contributions. Along the same lines, this shouldn't be taken as an endorsement of postionally adjusted offensive stats taken on their own. I am very much against a "this player doesn't hit well enough to be a corner outfielder" sort of thinking which doesn't taken defense into account -- see Dave Cameron's excellent write-up on that topic. I'm using position-adjusted offensive stats here just to see where VORP goes wrong, considered on its own terms.
So what are the differences between JustVORP and VORP? Why prefer one to the other? Sky summarizes it well in his JustVORP post, which you should read, but I will summarize some of the issues briefly:
- BP's VORP uses Marginal Lineup Value as its base run estimator. As has been shown in various places, MLV undervalues walks by half. That's right, Baseball Prospectus, the avant-garde of the OBP revolution (well, maybe 40-50 years after Branch Rickey), have the value of walks wrong in one of their flagship stats. Justin's stats, on the other, hand, use BaseRuns (discussed in many helpful places -- see Colin Wyers' "Batman" series listed at the end of this article for one) to establish the correct linear weight of each event. In short, BaseRuns is a dynamic run estimator that allows for the changing run expectancy of each baseball event in different run environments -- even the most extreme, where even linear weights break down (as for Runs Created and MLV, one might say they were broken to begin with...).
- JustVORP adjusts for the superiority of AL pitching, VORP does not.
- VORP defines replacement level in relation to the league offensive average at each position for the season. This has the weird result of making DHs seem more valuable than 1Bs, because their averages aren't hitting well. Yet we know that DH is the easiest position to "replace," and is thus (all things being equal) the least valuable. Just because teams aren't using their DH spot doesn't change that fact. That's just the most obvious example -- in some seasons, the average CF hits better than the average LF (think of teh 1950s). But that certainly doesn't mean that the LFers are harder to find and replace, because all CFers can play LF, but not vice-versa. VORP (in some eras) might lead one to think otherwise. JustVORP avoids this by using Tom Tango's positional adjustments (prorated for playing time, of course): Catcher +12.5, SS +7.5, 2B/3B/CF +2.5, LF/RF -7.5, 1B -12.5, DH -17.5.
But what is the practical effect of all this? Is this just sabermetric nitpicking? In one discussion on the Book Blog, comparing VORP with BP's RARP (runs above replacemente using Equivalent Runs -- a better run estimator anyway that approximates the correct linear weights), Colin Wyers found that RARP itself revealed VORP to be undervaluing not only players who walked a lot, but also people who hit a higher than usual proportion of doubles. The average VORP "error" along these lines for regulars was about 5.8 runs, and about 6.6 for the doubles hitters. That's a pretty big error, once you realize that that's more than half a win -- and teams are pahing between four and five milllion dollars for each marginal win in 2009. And that's just addressing the problem in terms of the linear weights.
What does this come to in concrete terms? For the sake of discussion, let's assume that JustVORP is on the right track in terms of run estimation, positional adjustment, etc. How big of a difference does just using plain ol' VORP make. Let's take the top ten position players in baseball according to VORP (rounded to the nearest run for reasons having to do with my less-than-brilliant abilities in importing data into MYSQL):
| Name | 2008 VORP |
| Albert Pujols | 99 |
| Hanley Ramirez | 79 |
| Chipper Jones | 76 |
| Lance Berkman | 73 |
| David Wright | 66 |
| Chase Utley | 63 |
| Alex Rodriguez | 63 |
| Jose Reyes | 63 |
| Matt Holliday | 62 |
| Dustin Pedroia | 60 |
Not too many surprises there. But now, let's compare BP's VORP numbers with their JustVORP numbers for the same players
| Name | VORP | JustVORP | Difference |
| Albert Pujols | 99 | 78 | +21 |
| Hanley Ramirez | 79 | 76 | +3 |
| Chipper Jones | 76 | 68 | +8 |
| Lance Berkman | 73 | 57 |
+16 |
| David Wright | 66 | 65 |
+1 |
| Chase Utley | 64 | 56 |
+8 |
| Alex Rodriguez | 63 | 62 |
+1 |
| Jose Reyes | 63 | 59 |
+4 |
| Matt Holliday | 62 | 46 |
+16 |
| Dustin Pedroia | 60 | 53 |
+7 |
Not all of those are minor differences. Albert Pujols is almost certainly the best player in professional baseball at the moment, but overrating him by 20 runs -- well, if a team was using VORP to evaluate what they should pay a player, that's around a $10 million difference. Berkman and Holliday's ratings are almost as problematic. Note that all three of those players are from the "easy" end of the defensive spectrum, as well as being from the NL.
I do not want to lose myself in statistical detail that I'm not qualified to go into anyway. The best thing to do would be to go through multiple years and check the standard deviations, the variance, the error, and (as Colin did) the average error. What I will do is easier and more straightforward, if not as definitive. To check out the "positional biases," I went position-by-position (primary position as defined by VORP for players who played multiple positoins), and took all the players with at least 500 plate appearances (400 for catchers and DHs). Then I averaged the difference between VORP and JustVORP by player. Again, this is a measurement of the average over/underrating of each position. It is not the measure of the variance. It is not a multi-year study, and the sample sizes are low. So these only allow us, at best, to from some preliminary hypotheses about VORP's postional biases.
| Position | Difference |
| DH | +10.8 |
| 1B | +6.8 |
| 2B | -0.8 |
| 3B | -2.7 |
| SS | +2.1 |
| LF | +2.0 |
| RF | +0.1 |
| CF | -1.1 |
| C | -4.7 |
There it is. While the sample size is too small and the method is too crude in our little analysis to say for sure, so far, the hypothesis that VORP overvalues 1Bs and DHs -- the easiest to replace part of the talent pool, and undervalues everyone else --particularly catchers and third basemen -- is borne out. While one or two runs isn't that bad, 7 runs per first basemen is.
(To repeat: we're leaving defense aside for the sake of getting at the offensive and positional adjustment difference. A complete player evaluation must take defense into account, of course.)
Let's take some concrete cases. Ryan Howard, for example, just got a contract extension from the Phillies that will buy out his remaining free agency years. I won't give a full analysis here. Matthew Carruth argues that Howard probably got overpaid. That almost never happens when a player gets bought out pre-free agency, but it did in this case due to the (avoided) arbitration hearings giving too much home runs and RBI and not considering those things that VORP is supposed to look at, such as positional adjustments. But the terms of the signing don't concern us directly here. If you were a GM, and just used VORP from BP (assume defense to be equal), what would your evaluation of Howard's 2008 contribution over a replacement level first baseman be? 36 runs, or about 3.5 wins. Assuming $5 million her marginal win, that implies a salary of $17.5 million a year. JustVORP, on the other hand, says that a Howard was 25 runs above a replacement first baseman. That's about a win, or $5 million, difference. That's a lot of money, even to a baseball team. And in this case, I think that VORP is probalby overvaluing Howard for one or two reasons: 1) VORP proceeds from positional averages, not accounting for the fact that first base is more easily replacable, even with players who normally player other positions, because it is the easiest position to play on the field; and 2) VORP does not adjust for the easier pitching in the National League.
A very different case is Akinori Iwamura, the second baseman of the Tampa Bay Rays. VORP values him at 17 runs above replacement, while JustVORP has him at 27 RAR. Again, that's about a win, or $5 million. The problem again is probably positional adjustment and league difficulty adjustment, and proportionally, for player who clearly isn't a star like Iwamura, that's big difference -- the difference (assuming average defense) between a 1.7 WAR, below average, stopgap starter who might need to have a replacement available coming up through the minors, and a 2.7 WAR slightly above average player (average being around 2 WAR) who is easily holding down his spot.
The point is clear. Tango, Wyers, and others have pointed out that the runestimator at the heart of VORP is flawed in its treatment of walks and also doubles. When considering positional adjustments and the difference in difficulty between leagues, even in the few cases we looked at here (the top ten players by VORP, as well as Howard and Iwamura) as well as the general "error" by position, it's clear that VORP misvalues more than a few players, and by more than a little. Sabermetrics is supposed to help teams get better evaluations of players. If the assumptions on which JustVORP (and similar "überstats" such as FanGraphs' WAR -- that comparison will be for another column) are true, then in comparison, VORP fails on that score.
Oh, and if you're a sabermetrically-oriented blogger, and don't want to become an out-of-touch old man, take a hint from Murray Chass and Jon Heyman. Stop using VORP. Don't worry, though. BaseRuns and wOBA will bug them even more.
[Brief Update: I have written a supplementary post comparing the VORP and JustVORP numbers for the 2008 Royals over at Royals Review. I don't want to overdo the Royals-related content here, but if you want another concrete comparison of how big the difference can be, there you have it...]
2 recs |
24 comments
|
Comments
I wonder if BP is going to try to make VORP catch up with the rest of sabermetrics
Clearly VORP is behind the times and badly in need of a few different types of upgrades. But it doesn’t appear that BP is particularly interested in innovating in the field of statistical analysis. They still undertake statistical analysis, but only with existing tools. And mostly they just use their own tools, and are very hesitant to pick up innovations from others in the field. So they aren’t updating and they aren’t innovating. It’s pretty sad considering that site was once the gold standard in sabermetrics.
The immoderate moderator
by NYRoyal on Feb 17, 2009 7:02 AM PST reply actions 0 recs
They did SuperVORP
But I don’t know enough about it to comment on the changes.
Webmaster of Driveline Mechanics
http://www.drivelinemechanics.com - An Unconventional Look at Scouting
by Kyle Boddy on Feb 17, 2009 9:58 AM PST up reply actions 0 recs
SuperVORP is just VORP+FRAA. None of the underlying fundamentals of VORP have changed.
by cwyers on Feb 18, 2009 9:40 AM PST up reply actions 0 recs
hey, folks, it's our first celebrity
(well, from my point of view, and at least in one of my posts)
Nice to see you here, Colin.
Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.
by devil_fingers on Feb 18, 2009 10:32 AM PST up reply actions 0 recs
oh yeah
Adding FRAA to VORP.. .THAT should make it better (winky face)
Colin, am I right when I assert that EqA/R doesn’t have the same problem with walks that VORP does? Are the weight pretty close to right? Since I made that remark, I’ve read some older stuff on both sides of the issue. I don’t want to confuse things any more with my stupidity…
Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.
by devil_fingers on Feb 18, 2009 10:35 AM PST up reply actions 0 recs
I didn't realize I was a celebrity.
I’m sure my wife would be surprised to hear it.
EqA is pretty close to right when it comes to the value of the walk, HR, etc. Most of the issues I have with EqA/EqR is that it’s simply inconvenient compared to a more plain-vanilla LWTS without actually being any better. There’s nothing WRONG with it, per se.
by cwyers on Feb 18, 2009 11:20 AM PST up reply actions 0 recs
Yeah, that's what I thought about EqA -- too complicated
You’re a celebrity to me because you got me into MYSQL, which is a much more efficient way for me to waste time… That, and I like your columns.
At some point, I want to pick your brains about your BaseRuns version of FIP, which I have messed with a bit, but I want to make sure I’m doingn it right.
Also, it’s not worth starting a new thread for: how bad are the B-R park factors that are in the BDB database? Are they useful at all, if I wanted to park-adjust some of by wOBA/LWTS figures? Are they the multi-year ones, or the single year?
Anyway, don’t mean to pick your brains too much here, just had some questions and no good place to ask, and maybe others can benefit if they read it here. I’ll hold off in begging for a Retrosheet tutorial…
Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.
by devil_fingers on Feb 18, 2009 12:04 PM PST up reply actions 0 recs
If you (or anyone else) has some sort of staty question you’d like answered (particularly when it concerns something I wrote), feel free to drop me a line at:
pontifexexmachina at hotmail.com
I try to keep up with that sort of e-mails, but sometimes I tend to fall behind on my correspondance.
The BDB park factors – I hope those are the three-year factors, but I don’t know. If they’re the three year, they’re probably better than no park adjustment. For applying park factors to LWTS, I prefer the linear method I outlined in my player value series over at THT. Essentially, take the park factor and divide by 100 to get the actual park factor.
Then take the average R/PA (normally about .12) and park adjust that – so if your park factor is 1.09, then you get:
1.09*.12=.1308
Then take:
.12-.1308=-0.0108
Then, for a hitter with 650 PAs, you’d simply take:
-0.0108*650=-7.02
And then you add that to the LWTS total, so if you had a player who was, say, +20 according to LWTS they become +13.
(A case could be made to apply the park factor per out instead of per PA. It’s something I haven’t really considered yet.)
A better set of park factors to use is Patriot’s historical park factors. He has 2007 and 2008 park factors available elsewhere on his site as well.
As for park adjusting wOBA, I haven’t really put much thought into it. I normally just divide by the park factor, but I’m not sure that’s correct. Maybe someone like terpsfan or Tango can weigh in on it (I know terps has looked at it a lot more than I have).
by cwyers on Feb 19, 2009 9:45 AM PST up reply actions 0 recs
Well that sounds awful.
Webmaster of Driveline Mechanics
http://www.drivelinemechanics.com - An Unconventional Look at Scouting
by Kyle Boddy on Feb 18, 2009 11:30 AM PST up reply actions 0 recs
"Hey, this hamburger meat is a bit old, but we can still make a burger out of it...
… you know what would fix it? Those buns that have been sitting out in that really humid room."
Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.
by devil_fingers on Feb 18, 2009 12:01 PM PST up reply actions 0 recs
But I agree
That they are moving backwards. Will Carroll is trying his best to keep it fresh, updated, and Web 2.0-ish, but the truth is that the best contributors have moved on and Nate Silver has his hands in so many other things that I highly doubt he is innovating very much.
All Fangraphs did was aggregate all the data and formulas into one place, something that Tango, SABR Matt, and many other prominent people never got around to doing. As a result, they are becoming the gold standard for sabermetrics.
Webmaster of Driveline Mechanics
http://www.drivelinemechanics.com - An Unconventional Look at Scouting
by Kyle Boddy on Feb 17, 2009 9:59 AM PST up reply actions 0 recs
And despite BP's protestations that it can't be done...
they are, in fact, doing it all free of charge.
VORP isn’t the only thing at BP that’s messed up – their fielding statistic, FRAA, is a joke. EqA, like MLVR, undervalues walks and doubles (I think MLVR is based on EqA). Anyone who’s numbers tell you that the Twins trade with the Rays (you know, the Delmon Young for Garza and Bartlett one) was a draw is someone who has made a severe miscalculation.
I think that BP really runs the risk of becoming irrelevant if they don’t completely change things around by 2010. They still do good things – Carroll is excellent on injuries, PECOTA is excellent (but not much better than CHONE), and some of their interviews with players, prospects, and others inside the game are fun to read. But the foundation of any performance analysis organization is the numbers, and BP’s is crumbling.
by BraveBronco0121 on Feb 17, 2009 5:52 PM PST up reply actions 0 recs
actually, if you read the the linked articles from Tango and Wyers (on VORP) above
MLV (the basis of VORP) is completely separate from EqA/R. EqA/R is basicallly a proxy for linear weights, and, in fact, they do have (not on the team audit pages), RARP, which is basedon EqA (or EqR). I’ve done the MYSQl that Colin provides somewhere tha tshows the discrepancy. I think that would solve the walks problem, at least. You can see RARP here , for example, although it isn’t nearly as prominent as VORP. It doesn’t match JustVORP or WAR exactly, of course, because they still work from league averages by position a la VORP (I believe), but there are still clear discrepancies between the VORP numbers for 20o8 and the RARP numbers. Even between VORP and RARP, for example, there is an ~11 run difference on Pujols, 14 on Berkman, etc. So already they have a much better metric on hand, it’s just hard to find. They should put those numbers in the team audit pages.
I don’t want to shut this discussion down, and I agree with most of what you all are saying, but I didn’t want to totally rip on BP, just to point out the sort of stats to which we should be referring, and that, in fact, the “differences” (incorrect valuations) are not just trivial, but significant.
Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.
by devil_fingers on Feb 17, 2009 6:32 PM PST up reply actions 0 recs
Agreed
It’s worth it for PECOTA alone (besides being better, it is laid out extremely well on the site unlike CHONE), but the content on the site is sorely lacking with Woolner et. al. gone. Transaction Report with Karhl is good too, though, and Goldstein’s articles are solid but also eminently replaceable by a subscription to Baseball America.
Fangraphs and our own Beyond the Box Score continue to gain in the market for sabermetrics, IMO.
Webmaster of Driveline Mechanics
http://www.drivelinemechanics.com - An Unconventional Look at Scouting
by Kyle Boddy on Feb 17, 2009 8:45 PM PST up reply actions 0 recs
PECOTA's better overall
but CHONE and ZiPS have definitely narrowed the gap
Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.
by devil_fingers on Feb 17, 2009 9:06 PM PST up reply actions 0 recs
I agree
But PECOTA is much cooler to look at with the player cards than CHONE. That’s worth something in my eyes!
Webmaster of Driveline Mechanics
http://www.drivelinemechanics.com - An Unconventional Look at Scouting
by Kyle Boddy on Feb 17, 2009 9:43 PM PST up reply actions 0 recs
CHONE has improved that significantly this year
The “expanded” page for each player with a percentile breakdown, as well as run and win values makes it as valuable and deep as PECOTA, if not moreso (due to a meaningful description of value above replacement). CHONE’s expanded pages just aren’t quite as pretty as PECOTA’s player cards.
The immoderate moderator
by NYRoyal on Feb 17, 2009 10:40 PM PST up reply actions 0 recs
Link?
I am using what I see on Fangraphs. I guess I should find the actual home page!
Webmaster of Driveline Mechanics
http://www.drivelinemechanics.com - An Unconventional Look at Scouting
by Kyle Boddy on Feb 18, 2009 12:10 AM PST up reply actions 0 recs
right 'chere
http://www.baseballprojection.com/
Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.
by devil_fingers on Feb 18, 2009 6:07 AM PST up reply actions 0 recs
And wasn't Nate supposed to have those player cards out Monday?
yeah, I know, he’s busy calculating, what exactly?
Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.
by devil_fingers on Feb 18, 2009 6:08 AM PST up reply actions 0 recs
(Nate, you're still the Man. I'm avilable if you just want to chat about, um, whatever)
Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.
by devil_fingers on Feb 18, 2009 6:08 AM PST up reply actions 0 recs
Fangraphs might be doing it free of charge, but I doubt they're making anywhere near the amount of money BPro is.
Fewer people to pay, but still…
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Feb 20, 2009 7:51 PM PST up reply actions 0 recs
how does FanGraphs pay its authors
The ads are really recent, and even that can’t generate that much money, can it?
NOne of my business, I’m not trying to pry, but I really wonder.
Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.
by devil_fingers on Feb 20, 2009 8:21 PM PST up reply actions 0 recs
For more concrete evidence
you can check my supplementary post at Royals Review on the Royals 2008 position players. I don’t want to overdo the Royals here since they are “my” team, but you might find it helpful.
Bringing you more-or-less replacement level analysis and commentary since sometime in 2008.
by devil_fingers on Feb 17, 2009 6:33 PM PST reply actions 0 recs













