Predicting Trevor Williams’ Whiff Rate Via Machine Learning Offers Clues to His Improvement
Sequencing Could Be Key
Hey Siri, tell me how Trevor Williams can get more strikeouts this season. It wasn’t quite that easy, but I did ask my computer to consider all of the new Cubs starter’s Statcast from 2019-2020 in an attempt to predict slider whiffs.
Specifically, I fed it the following data:
- Vertical location (plate_z)
- Horizontal location (plate_x)
- Velocity
- Spin Rate
- Horizontal release point
- Vertical release point
- Horizontal movement (pfx_x)
- Vertical movement (pfx_z)
- Release extension
- Previous pitch
- Previous two pitches
- Pitch count
The computer figured out the most important numbers and identified cutoff points to classify a whiff, resulting in a tree-looking figure and an algorithm. I then tested the algorithm on half of Williams’ data and found that we can predict his whiff rate at about 86% accuracy.
The most determinant Statcast metrics for his slider included vertical pitch location, horizontal movement, vertical movement, pitch count, and previous pitch. That part isn’t very novel because of course throwing low sliders with more horizontal and vertical break will lead to more swings and misses.
What’s a little more interesting is that sliders thrown after four-seam fastballs in counts of less than two strikes were also determinant factors. This understandably suggests sequencing plays a vital role and could explain how Williams achieved a league-average slider whiff rate despite below-average movement numbers.
It’s important to remember that this predictive method only considers the variability in Williams’ repertoire. For example, just because release point wasn’t a determinant factor for whiffs doesn’t mean it is irrelevant. It just means, at least in this case, that the variation in his release point wasn’t predictive.
Still, maybe the Cubs can play with his release point, which is lower than 80% of MLB pitchers, a little bit to optimize movement. They can also dig further into his sequencing to promote better tunneling, etc. More movement and better location with careful sequencing will undoubtedly lead to more whiffs.
I personally think it’s informative that we can predict these results with 86% accuracy, but 14% is still a lot of error. It’s up to the Cubs coaches to figure out what contributes to that error and do their best to eliminate as much of it as possible. That should be exciting for fans because it shows there’s real room for improvement.
Considering what the Cubs have accomplished lately under the guidance of Tommy Hottovy and Craig Breslow, I’m optimistic for improvement as Williams joins the rotation.