MLB Power Ratings Update: 8/13/2024
Ratings Update
Back in May I took a stab at creating some initial MLB power ratings. It’s been two and a half months since that post, so I figured it was time to update the ratings and also share some new thoughts on my methodology. I’m not going to go through how the ratings work in this post, if you’re interested, see the link above.
The first table shows the ratings as of today (August 13th) and the second table shows the change in a team’s rating since the previous update on May 26th.
The Yankees and Dodgers are at the top of the list but each of their ratings have gone down by 0.3 runs a game since the last update. Both clubs were scorching hot at Memorial Day but have played roughly .500 baseball since then, so this checks out. At the other end, I was pretty surprised to not see the White Sox in last. The model stubbornly sees the Rockies as a good bit worse than them. Overall, the results are sensible with the top 8 spots all held by likely playoff teams.
Methodology Update
The key aspect of my model that I highlighted in my last post was that it relies on batted ball statistics to calculate how many runs a team ought to score and allow. I hypothesized that some teams, like Cleveland, were scoring more runs than my model expected because of unsustainable clutch hitting luck. If this hypothesis is true, then it would follow that teams that outperformed my model in April and May should not be expected to outperform again in June and July.
The scatterplot above tests this theory. On the x-axis we have a team’s run differential from April 1st to May 26th and on the y-axis we have a team’s run differential from May 27th to today. If my model is working as I would expect, this scatterplot should look like random noise. Cleveland is all the way at the right at (94,23). Overall there is a very small positive correlation- some teams (Guardians, Royals, Brewers) are firmly in the top right, meaning that they outperformed their expected statistics in both periods. Some teams (White Sox, Marlins, Rockies) are in the bottom left, meaning they underperformed in both periods. There is a weak trend (r = 0.27) but nothing large. I am willing to conclude that this exercise is a moderate success, I can’t say for sure there’s no relationship but there’s also not a very strong one.
Brief Postscript: The Yankees
Suppose Aaron Judge suffered a season ending injury tomorrow. How much worse should I expect the Yankees to be? How much would their World Series chances decline? The first question is easy to answer with WAR but I don’t see the second question discussed much.
Judge is on pace for a historic 12 fWAR season, so we can estimate that the Yankees will be 12 wins worse without him. They’re on pace to be a 95 win team, so let’s say they’d be an 83 win team without him. But how does this look in terms of my model?
According to Fangraphs, Judge has been worth 80 runs more than a replacement player this year, or roughly ⅔ of a run per game. Just dropping him from the lineup would change the Yankees’ rating from +0.88 to +0.21. This seems about the same as calling them an 83 win team- a rating of +0.21 puts them right in between the Mets and Red Sox, who should both finish around that mark.
To put this in win probability terms, suppose the real Yankees played a Judge-less Yankees team. According to my model, the full-strength Yankees would have a 57% chance of winning a single game. Furthermore, the full-strength Yankees would have a 63% chance of winning a 5 game series and a 65% chance of winning a 7 game series. We can thus assume that the Yankees’ chance of winning the ALDS would go down by about 13 percentage points without Judge and their chances of winning the ALCS and World Series would go down by about 15 percentage points. Multiplying this all together, you get that the Yankees’ chance of winning the World Series is cut almost exactly in half without Judge.
MLB Power Ratings v1.0
Intro and Methodology
I’ve never made a serious attempt to create power ratings for any sport other than college football. The main reason for this is that college football is my favorite sport, so it is the one that I am most interested in spending time on. However, I also feel that it is relatively poorly understood compared to the other major sports. Baseball, on the other hand, lives on the opposite side of that spectrum- it is by far the best understood of all the major sports.
Interestingly, baseball fans seem to spend relatively little time debating which teams are good. College football fans will talk themselves into a frenzy debating whether a 10-2 SEC team is better than an 11-1 Pac-12 team and will make every kind of power rating to argue their case. Baseball fans will talk about run differential and Pythagorean win percentage, but they generally let the standings do the talking- no one is going to claim that an 81 win team was better than a 90 win team.
I thus decided to use MLB’s Statcast data to create some power ratings. My goal is to create one number that represents expected team strength going forwards- and perhaps along the way I’ll find that some teams are actually much better or worse than their record. I decided to use xwOBA as my instrument of choice- I assume that a team’s xwOBA to date is a good estimate of their future wOBA. I then extrapolate the number of runs per game I expect them to score from their expected future wOBA, and repeat the same process for pitching xwOBA. (For more on wOBA and how to calculate offensive output from it, read this excellent primer from Fangraphs. For more on the calculation of xwOBA and how it differs from wOBA, read this summary from the Statcast glossary).
There are a few clear limitations of this approach:
This crude model has no sense of any individual players, just an entire team’s hitting or pitching statistics. It will not penalize the Braves for Ronald Acuña Jr.’s injury, nor will it reward a team who trades prospects for an ace pitcher at the trade deadline.
I’m assuming that past xwOBA is an accurate and unbiased predictor of future wOBA. This is a reasonable assumption for a sufficiently large sample size, but early in the season, a team’s xwOBA season-to-date is going to be quite volatile. By time of writing (late May), each team has had approximately 2000 plate appearances so the numbers are a bit more stable.
This model has no sense of the value of defense. Perhaps a team’s pitching xwOBA allowed is consistently better than its wOBA because they have a strong defense- this model would underrate this team as it would assume some regression to the mean that will not actually happen.
This is hardly the most advanced model of all time but nonetheless we may be able to draw some interesting conclusions from it.
Preliminary Results (through 5/25/2024)
Alright, enough with the unnecessarily complicated methodology. What do the numbers actually say? All numbers here are meant to be interpreted as runs above the average team- that is, the Yankees hitting is 0.9 runs/game better than the average team, and their pitching is 0.3 runs/game better than the average team, so in total they are 1.2 runs/game better than average.
This crude rating system has determined the four best teams in baseball are the Yankees, Phillies, Dodgers and Braves- that’s a good sign. At the bottom, it’s determined that the Marlins, White Sox and Rockies are the worst teams, which also checks out. In between, we have some strange things going on. The Angels and Athletics are in the top half, while the Guardians are 23rd despite being 19 games over .500. Let’s use Cleveland as an example to dive into some interesting things about this exercise.
Cleveland as a Case Study
To analyze why my model hates the Guardians so much, I decided to look at every team’s run differential (as above, all numbers are from the beginning of the season through May 25th), and compare it with both their run differential as expected by both wOBA and xwOBA.
Some data definitions:
Run Diff: Actual runs scored - Actual runs allowed
wOBA Run Diff: Run differential, as estimated by wOBA accrued and allowed
xwOBA Run Diff: Run differential, as estimated by xwOBA accrued and allowed
Actual runs over wOBA: Actual run differential - wOBA run differential
xwOBA - wOBA runs: xwOBA runs - wOBA runs
Net runs over xwOBA: Actual run differential - xWOBA run differential
The main column to focus on here is “Net runs over xWOBA”, which is a team’s actual run differential minus what my model would have expected so far. Cleveland’s run differential is an astonishing 94 runs better than my model would have projected. We can break down net runs over xWOBA into two distinct factors. Understanding these two factors is crucial as they essentially tell us how and why a team’s actual performance differs from what the model would expect.
Actual runs over wOBA: Teams that are good at this are scoring more runs than their distribution of batting outcomes would suggest (and allowing fewer runs than their distribution of batting outcomes allowed would suggest). That is, they tend to cluster their hits together in the same inning.
xWOBA runs - wOBA runs: Teams that are good at this are scoring more runs, and allowing fewer, than would be expected by their batted ball data.
One important question to grapple with is whether we think teams can consistently outperform (or underperform) the model by being systematically strong at either category. Factor (1) pretty much is just clutch hitting- a team that clusters all of their baserunners in one inning is going to score far more runs than expected by my model. Conveniently, the teams that are the best in this metric (Cleveland and Kansas City) are also the MLB leaders in OPS+ with runners in scoring position, and by quite a large margin.
This raises the question of whether this is a repeatable skill. There are some sports where I am willing to believe “clutch” performance is a real skill. For example, in tennis, managing one’s energy level is hugely important so players will try harder in big moments- Novak Djokovic famously wins tiebreak points at a much higher rate than he does low leverage points. However, evidence for clutchness being a repeatable thing in baseball is mixed, at best. This makes sense given that (with the exception of starting pitchers) baseball players are expending energy in short, concentrated bursts. There are some reasons to believe teams might consistently outperform in big situations (perhaps better bullpen management in high leverage situations, for example), but I think a lot of regression to the mean can be expected here.
Factor (2) is harder to wrap one’s head around. Perhaps some teams are better at finding holes in opposing defenses and thus outperform their batted ball data. A stronger potential explanation is that some teams may persistently allow fewer runs than their xwOBA surrendered would indicate because their defense is good. Unfortunately I see little correlation between the strongest teams in this metric and the best defensive teams.
In summary, my model thinks the Guardians are much worse than their record because they’ve hit obscenely well in the clutch (their OPS+ with runners in scoring position is 151) and they also have had good batted ball luck. Anecdotally it seems to me that Vegas agrees with me- they were only -130 favorites on the road against the Rockies today, with two comparably poor starting pitchers on the mound for either team. Compare this to the Phillies, who have a marginally better record. They were -220 to -250 favorites at Colorado every game over the weekend.
Conclusion
I think that this model is a nice starting point, but as I said at the beginning of the article, there are some very clear limitations. The next thing I would like to do is add starting pitchers to the model, which I’ve already begun some initial work on. I’ll post periodic updates here over the coming weeks on both teams’ updated power ratings as well as information about how the model is performing.
Optimal Pitching Roster Construction
One of the unique things about fantasy baseball is that there is a lot more diversity in league formats than there is in fantasy football. While I’ve been playing fantasy football for over a decade, I’m relatively new to the world of serious fantasy baseball, so as I’ve gotten ramped up to the format recently I’ve had to learn some lessons about how to construct a roster. This post is a discussion of how to construct an optimal pitching staff in a head-to-head points league that has SP and RP caps but no innings cap. All stats I reference from here on are season-to-date as of May 26th 2023.
What can you expect from starting pitching?
One of the nice things about a starting pitcher relative to a reliever is that they don't use a roster spot unnecessarily. That is, you know what days they're going to pitch- some days you insert a reliever into your lineup only for him not to pitch, and leave a reliever who does pitch on your bench.
A healthy starter will give you 1.2 starts/week (they will make 32 starts over the course of the season and the season is 185 calendar days).
Here are some benchmarks for points/start among starting pitching.
SP1 (Shane McClanahan) 18.8 points/start
SP10 (Framber Valdez) 16.6 points/start
SP25 (Jon Gray) 13.9 points/start
SP50 (Hunter Greene) 9.5 points/start
There are loads of guys on the waiver wire averaging 7 points/start. I think that by playing the matchups well you can get 8 points/start out of the waiver wire.
What can you expect from relief pitching?
Here are some benchmarks for point/appearance among relief pitching. Keep in mind that appearances are tougher to predict for relievers than starters.
RP1 (Felix Bautista) 6.5 points/appearance
RP10 (David Robertson) 5.7 points/appearance
RP25 (Andrew Chafin) 3.8 points/appearance
There are loads of relievers available who average 3.5-4 points/appearance.
The value of streaming
From these figures we can come up with some hard numbers on the value of streaming. Suppose your options are to roster a decent starter, around SP40 or so- you can bank on him getting around 11 points/start, and making 1.2 starts/week, making him worth 13.2 points/week. If instead, you use that roster spot to make 5 streams at 8 points/stream, that roster spot is worth 40 points/week, triple what you get from a decent starter. A streaming starting spot is the single highest scoring spot on your roster- more than you'd expect from Shane McClanahan (~22 points/week) or Freddie Freeman (~24 points/week). The problem, of course, is that it costs a very valuable resource- roster moves. Choosing not to stream because you want to keep a mediocre pitcher on the roster is a fool's errand- every week you do this, you are burning about 25 points/week.
The value of RP/SP
This brings me to my final topic- the famed RP/SP position. A RP/SP who starts is valuable because, unlike other relievers, you can predict when he will pitch and only start him accordingly. A replacement-level starter who has an RP/SP designation is worth 8 points/start * 1.2 starts/week = 9.6 points/week, although you do have to pay the opportunity cost of not using a proper reliever that day, which I value at approximately 1.2 times/week * 70% chance of reliever being used * 4 points/relief appearance = 3.4 points/week. Therefore, having a crappy guy as an RP/SP is worth an additional 6 points per week. 6 points per week may not sound like a lot, but that's also the difference between a hitter who gets 3 points/game and a hitter who gets 2.2 points/game, which we know to be a tremendous difference.
A quick side note- because Shohei Ohtani does not count against the SP cap, but like an RP/SP is predictable when he will pitch, he also gets this benefit. I believe this fact makes Ohtani by far the most valuable asset in this scoring system.
Optimal Roster Construction
Given our constraints of 8 SP and 4 RP maximums, I think that the optimal way to construct a staff is as follows:
7 traditional SP who you do not drop- 8.4 starts/week at 13 points/start = 109 points/week. 13 points/start is equivalent to approximately SP35.
1 SP spot for streaming- 5 starts/week (I'm assuming you use 1 roster move/week on a hitter) at 8 points/start = 40 points/week
2 RP/SP- 1.2 starts/week at 8 points/start = 19 points/week
2 traditional RP- 3 appearances/week at 5 points/appearance = 30 points/week
In total, this gives 203 points/week from one's pitching staff if correctly managed, and this is only assuming average starting pitching. Good roster management (streaming and the RP/SP trick) take your pitching staff from 144 points/week to 203 points/week.