MLB Power Ratings v1.0
Intro and Methodology
I’ve never made a serious attempt to create power ratings for any sport other than college football. The main reason for this is that college football is my favorite sport, so it is the one that I am most interested in spending time on. However, I also feel that it is relatively poorly understood compared to the other major sports. Baseball, on the other hand, lives on the opposite side of that spectrum- it is by far the best understood of all the major sports.
Interestingly, baseball fans seem to spend relatively little time debating which teams are good. College football fans will talk themselves into a frenzy debating whether a 10-2 SEC team is better than an 11-1 Pac-12 team and will make every kind of power rating to argue their case. Baseball fans will talk about run differential and Pythagorean win percentage, but they generally let the standings do the talking- no one is going to claim that an 81 win team was better than a 90 win team.
I thus decided to use MLB’s Statcast data to create some power ratings. My goal is to create one number that represents expected team strength going forwards- and perhaps along the way I’ll find that some teams are actually much better or worse than their record. I decided to use xwOBA as my instrument of choice- I assume that a team’s xwOBA to date is a good estimate of their future wOBA. I then extrapolate the number of runs per game I expect them to score from their expected future wOBA, and repeat the same process for pitching xwOBA. (For more on wOBA and how to calculate offensive output from it, read this excellent primer from Fangraphs. For more on the calculation of xwOBA and how it differs from wOBA, read this summary from the Statcast glossary).
There are a few clear limitations of this approach:
This crude model has no sense of any individual players, just an entire team’s hitting or pitching statistics. It will not penalize the Braves for Ronald Acuña Jr.’s injury, nor will it reward a team who trades prospects for an ace pitcher at the trade deadline.
I’m assuming that past xwOBA is an accurate and unbiased predictor of future wOBA. This is a reasonable assumption for a sufficiently large sample size, but early in the season, a team’s xwOBA season-to-date is going to be quite volatile. By time of writing (late May), each team has had approximately 2000 plate appearances so the numbers are a bit more stable.
This model has no sense of the value of defense. Perhaps a team’s pitching xwOBA allowed is consistently better than its wOBA because they have a strong defense- this model would underrate this team as it would assume some regression to the mean that will not actually happen.
This is hardly the most advanced model of all time but nonetheless we may be able to draw some interesting conclusions from it.
Preliminary Results (through 5/25/2024)
Alright, enough with the unnecessarily complicated methodology. What do the numbers actually say? All numbers here are meant to be interpreted as runs above the average team- that is, the Yankees hitting is 0.9 runs/game better than the average team, and their pitching is 0.3 runs/game better than the average team, so in total they are 1.2 runs/game better than average.
This crude rating system has determined the four best teams in baseball are the Yankees, Phillies, Dodgers and Braves- that’s a good sign. At the bottom, it’s determined that the Marlins, White Sox and Rockies are the worst teams, which also checks out. In between, we have some strange things going on. The Angels and Athletics are in the top half, while the Guardians are 23rd despite being 19 games over .500. Let’s use Cleveland as an example to dive into some interesting things about this exercise.
Cleveland as a Case Study
To analyze why my model hates the Guardians so much, I decided to look at every team’s run differential (as above, all numbers are from the beginning of the season through May 25th), and compare it with both their run differential as expected by both wOBA and xwOBA.
Some data definitions:
Run Diff: Actual runs scored - Actual runs allowed
wOBA Run Diff: Run differential, as estimated by wOBA accrued and allowed
xwOBA Run Diff: Run differential, as estimated by xwOBA accrued and allowed
Actual runs over wOBA: Actual run differential - wOBA run differential
xwOBA - wOBA runs: xwOBA runs - wOBA runs
Net runs over xwOBA: Actual run differential - xWOBA run differential
The main column to focus on here is “Net runs over xWOBA”, which is a team’s actual run differential minus what my model would have expected so far. Cleveland’s run differential is an astonishing 94 runs better than my model would have projected. We can break down net runs over xWOBA into two distinct factors. Understanding these two factors is crucial as they essentially tell us how and why a team’s actual performance differs from what the model would expect.
Actual runs over wOBA: Teams that are good at this are scoring more runs than their distribution of batting outcomes would suggest (and allowing fewer runs than their distribution of batting outcomes allowed would suggest). That is, they tend to cluster their hits together in the same inning.
xWOBA runs - wOBA runs: Teams that are good at this are scoring more runs, and allowing fewer, than would be expected by their batted ball data.
One important question to grapple with is whether we think teams can consistently outperform (or underperform) the model by being systematically strong at either category. Factor (1) pretty much is just clutch hitting- a team that clusters all of their baserunners in one inning is going to score far more runs than expected by my model. Conveniently, the teams that are the best in this metric (Cleveland and Kansas City) are also the MLB leaders in OPS+ with runners in scoring position, and by quite a large margin.
This raises the question of whether this is a repeatable skill. There are some sports where I am willing to believe “clutch” performance is a real skill. For example, in tennis, managing one’s energy level is hugely important so players will try harder in big moments- Novak Djokovic famously wins tiebreak points at a much higher rate than he does low leverage points. However, evidence for clutchness being a repeatable thing in baseball is mixed, at best. This makes sense given that (with the exception of starting pitchers) baseball players are expending energy in short, concentrated bursts. There are some reasons to believe teams might consistently outperform in big situations (perhaps better bullpen management in high leverage situations, for example), but I think a lot of regression to the mean can be expected here.
Factor (2) is harder to wrap one’s head around. Perhaps some teams are better at finding holes in opposing defenses and thus outperform their batted ball data. A stronger potential explanation is that some teams may persistently allow fewer runs than their xwOBA surrendered would indicate because their defense is good. Unfortunately I see little correlation between the strongest teams in this metric and the best defensive teams.
In summary, my model thinks the Guardians are much worse than their record because they’ve hit obscenely well in the clutch (their OPS+ with runners in scoring position is 151) and they also have had good batted ball luck. Anecdotally it seems to me that Vegas agrees with me- they were only -130 favorites on the road against the Rockies today, with two comparably poor starting pitchers on the mound for either team. Compare this to the Phillies, who have a marginally better record. They were -220 to -250 favorites at Colorado every game over the weekend.
Conclusion
I think that this model is a nice starting point, but as I said at the beginning of the article, there are some very clear limitations. The next thing I would like to do is add starting pitchers to the model, which I’ve already begun some initial work on. I’ll post periodic updates here over the coming weeks on both teams’ updated power ratings as well as information about how the model is performing.