MLB Line Projections – 18 July 21

These are statistical projections and shouldn’t be the only thing that factors in to betting on a team, the stats only tell part of the story. Keep an eye on injuries, COVID Issues, and players returning from injury.

Like what you see? Please subscribe or follow me on Twitter (@AnalyticsB2) for the latest news and post info. Want to support the statistical data or have suggestions for improvement, feel free to send me an email (B2SportsStats@gmail.com) or you can donate via the website, Venmo (@B2stats) or Cashapp ($B2stats).

Coming Soon: Working to add more recent trends info such as ERA/FIP/xERA over last 6 starts, Home/Away records last 10, Over/Under records last 10, etc.


All double headers are accounted for being 7 innings.

I will post my top model that I use daily (RF), along with the top 2 performing models.

Pitchers, Spider Tack, & Spin Rate

Note: Now that Pitchers are going on there fourth or fifth start without sticky substances, things are starting to return to normal and although there is some regression for a few pitchers, the only real noticeable and tangible difference between before and now is the strikeout rates are down for pitchers like Gerrit Cole. I will still track this and update it periodically, but as of today I don’t think this will provide a very significant betting edge. More to come….

For those not following lately, MLB has faced some controversy lately about pitchers using a sticky substance called Spider Tack to increase spin rate of their pitches and as a result increase the movement of the ball making it harder for batters to hit the ball. As this blew up the last couple of weeks Pitchers have reportedly been reluctant to use the substance and some of their respective spin rates (and in some cases performance) has fallen off. The decrease in performance could be tied to a player using the substance or could be due to the player handling the attention poorly, either way it gives a betting edge in my opinion. This is pure speculation on my part, but if a great pitcher is now average because they stopped using the “secret stuff” then that gives me a betting edge and I will take full advantage of it.

So here is a list of some players that have been tied to a decrease in spin rate since the controversy hit a turning point and some comments on recent performance.

  • Rumored Pitchers: Trevor Bauer (LAD), Gerrit Cole (NYY), Brandon Woodruff (MIL), Corbin Burnes (MIL), Nick Pivetta (BOS), Shane Bieber (CLE), E. Rodriguez (BOS) Rumored Pitchers: Peralta (MIL), Eovaldi (BOS), Wainwright (STL), Fried (ATL), Kuhl (PIT), Glasnow (TBR), Rodon (CHW), Kluber (NYY), Chapman (NYY)…

Another thing to keep in mind, this can really affect the Over/Under in games. More players are using it than we know and if they stop and start giving up more runs, it can be an over bettors dream.

Why does it Matter? Check out this SI Article or this TheScore Article on how it could affect a game.

Summary of Projections

Model Record

Model Rank

Consensus Record

Consensus Profitability

Model Consensus

Sharp Report

ML – Some Sharp money has come in on the Marlins, Twins, Orioles, Rockies, & D-Backs.

O/U – Some Sharp money has come in on the Astros/White Sox Over 8, Giants/Cardinals Under 9, Dodgers/Rockies Over 11.5, Cubs/D-Backs Over 9, & Red Sox/Yankees Under 9.5.

Team Variance

Updated every Sunday.

What this chart is showing is each teams Variance & Standard Deviation (Std Dev). Variance and Std Dev are calculated from the season stats (need a larger sample size than 8 games). The list is ordered from lowest team variance to most team variance. The variance is how wide spread the data is, a team that scores between 1 and 2 runs every night will have very low variance, whereas a team that scores anywhere from 0 to 10 runs on a given night will have very high variance. The Std Dev is the square root of the variance and is a good measure for how consistent a team is… NOT how good or bad a team is, but how consistent they are. Std Dev shows the amount a teams score typically deviates from the average on a given night. The Mariners score (typically) will deviate about 2.50 runs from there average on a given night, where the Padres score (typically) will deviate 3.83 runs from their average on a given night. Obviously the lower the Std Dev the easier it is for my models to project the score and provide higher probabilities.

Team Trends

Home/Away Records

[MOV = Margin of Victory] This chart shows a teams record at home vs on the road, both Moneyline and Over/Under record. This info is critical to identify teams to play or fade at home vs on the road. O/U record is displayed as “Overs-Unders-Ties.” Each of these also produces and expected win % based on each teams home vs away stats. Expected win % is a good way to adjust for those few lucky bounces that skew a teams records. The W% is the difference in win % and expected win %, a positive W% Diff means the team is actually better than their record indicates, a negative W% Diff indicates a team has actually been worse than their record indicates.

Records After a Day Off

Teams with a day off: Tex & Tor

[MOV = Margin of Victory] Another good trend to take advantage of is a team coming off a day of rest. Some teams are unstoppable with a day of rest, while others must have spent that day partying or something because they simply can’t win after a day off.

Run in the 1st Inning

A popular betting option for some is a team to score a run in the first inning. Here is a breakdown of teams percentages of scoring in the first inning, broken down for total record and home vs away. Also included is the percent of runs allowed in the first inning by a given team. The “Prob of Scoring in 1st” column is calculated using the percent the Home/Away team scores in the vs vs the percent their opponent has allowed a run in the 1st Away/at Home. Of course this doesn’t tell the entire picture and starting pitcher has a lot to do with a team scoring in the first.

Starting Pitcher Expected Regression

Note: Not all pitchers are in the data source used for xERA resulting in an #Value! error.

Note: I shortened this chart to just have ERA+ expected win %, the other ERA+ numbers in this context aren’t super useful in my opinion. I am planning on adding a last X number of starts or last X weeks ERA/FIP/xERA to another chart here so you can see how a pitcher has done lately in regards to ERA/FIP/xERA. Working through formatting issues with python at the moment (note date: 23 June).

This chart shows each of todays starting pitchers ERA along with expected and advanced stats. Basically, a Pitcher who has a higher ERA with lower FIP and Expected ERA (xERA), indicates the pitcher has performed better than what the stats indicate. Conversely, a pitcher with a low ERA but higher FIP & xERA indicate he has benefitted from a few “Lucky Bounces,” but has not been as good as his ERA indicates. ERA+ is a comparison of how their ERA compares to other pitchers in the league adjusted for for ballpark related factors.

FIP (Fielding Independent Pitching) is a way to project the pitchers ERA taking into account only what the pitcher can control. This is the most common way for stat nerds like me to see if a Pitcher is pitching better than or worse than traditional stats suggest. If the FIP is lower than the ERA, I expect the Pitcher to pitch better than past performances, if the FIP is higher than the ERA, I expect the pitcher to be worse than past performances. ERA Value is ERA minus FIP, Positive indicates FIP is better that ERA, negative means FIP is worse than ERA.

From ERA+ we can also get expected win percentage, which is good to look for expected regression too. For example, If a pitcher that is 4-0, but has an expected win % of 50%, they should probably be 2-2, but may have benefitted from a lot of run support or a bit of luck.

Expected ERA (xERA) is based on expected weighted on base percentage (xwOBA) and is converted to ERA form. It is formulated using exit velocity, launch angle and, on certain types of batted balls, Sprint Speed. It accounts for things like how hard batters are hitting a certain pitcher. The idea is if a pitcher gets hit incredibly hard but right to a fielder for an entire game, he is a bit lucky and his stats will look better than they should. Next game I would expect some of those hard hit balls to find gaps or go over the fence and the pitcher to perform worse. Conversely, a pitcher who gives up a lot of hits in a game from weak contact like bloop shots, dribblers down the line, etc, the xERA will be lower than the actual EAR and I would expect next game the batters don’t get as lucky on the weak contact finding holes to fall into.

Bullpen Projected Runs

The Bullpen runs per 9 chart shows how many runs each model projects a given teams bullpen will give up over the course of 9 innings (essentially projected bullpen ERA). All of the bullpen stats are based on all non-starting pitchers for a given team and pulls stats from the last 12 games (same amount of games the normal model projections use). To project the bullpen stats, I use league average stats for all batting statistics vs that teams bullpen. The higher the number the more runs a bullpen is projected to give up and the worse they are. Because I use last 12 games some stats may be inflated (like giving up 20+ runs over 9 innings), but you get the point, they are bad. You may also see negative numbers for some model projections, that’s because they could have given up 0 runs against the best offenses and now they are going up against league average, so the model basically thinks it will do better against a worse offense and doesn’t know the number of runs cant be less than 0. Fading bad bullpens is a great in-play opportunity; when a bad bullpen team has the lead and they pull the starter, making an ML play on the other side can be very profitable.

Model Projections

My Model Choice:

Random Forest

Undecided Pitchers default to league average stats.

For Pitchers with inflated stats (i.e. other team projected to score something like 20 runs), I use the expected stats table to adjust for the inflated projections.

How to read the projections: The model type is at the very top. Matchups are denoted by the alternating white/green pairings, away teams are on top, home teams on bottom. The model analyzes the matchup and projects the home and away teams score (Proj Score). The difference between the home teams score and the away teams score give us what the model says the line should be (Proj Line). The “Line” column is what the Vegas line is at the time I run the model. The “Line Diff” is the difference in the projected line and the Vegas line. A positive line diff means the projected line is in the away teams favor compared to the Vegas line, negative means the projected line is in the home teams favor. The “Cover Prob” uses a normal distribution and the teams variance to project each teams probability to cover the listed Vegas line. Same thing for totals, “Proj Total” is the sum of the projected scores, “Line Total” is the listed Vegas total for the game, the difference between the two, and the probability to go over or stay under denoted by “O” and “U”.

Betting Edge

If you don’t understand what you are looking at, I recommend reading my post about betting tips. The percentages show the betting Edge, which is the cover prob (from above) minus the implied probability (-110 odds implied prob is 52.4%). If a team has a 92.4% chance to win and a 52.4% implied probability (or -110 odds), the Edge is 40%. The Edge alone doesn’t mean you should blindly bet it. To quantify, >35% is great value, 20-35% is really good, 10-20% is decent, <10 is ok value, blanks are negative value. ML is moneyline.

Top Performing Models:

1. Adaptive Boosting (Ada Boost)

Undecided Pitchers default to league average stats.

For Pitchers with inflated stats (i.e. other team projected to score something like 20 runs), I use the expected stats table to adjust for the inflated projections.

How to read the projections: The model type is at the very top. Matchups are denoted by the alternating white/green pairings, away teams are on top, home teams on bottom. The model analyzes the matchup and projects the home and away teams score (Proj Score). The difference between the home teams score and the away teams score give us what the model says the line should be (Proj Line). The “Line” column is what the Vegas line is at the time I run the model. The “Line Diff” is the difference in the projected line and the Vegas line. A positive line diff means the projected line is in the away teams favor compared to the Vegas line, negative means the projected line is in the home teams favor. The “Cover Prob” uses a normal distribution and the teams variance to project each teams probability to cover the listed Vegas line. Same thing for totals, “Proj Total” is the sum of the projected scores, “Line Total” is the listed Vegas total for the game, the difference between the two, and the probability to go over or stay under denoted by “O” and “U”.

Betting edge

If you don’t understand what you are looking at, I recommend reading my post about betting tips. The percentages show the betting Edge, which is the cover prob (from above) minus the implied probability (-110 odds implied prob is 52.4%). If a team has a 92.4% chance to win and a 52.4% implied probability (or -110 odds), the Edge is 40%. The Edge alone doesn’t mean you should blindly bet it. To quantify, >35% is great value, 20-35% is really good, 10-20% is decent, <10 is ok value, blanks are negative value. ML is moneyline.

2. Support Vector Machine (SVM)

Undecided Pitchers default to league average stats.

For Pitchers with inflated stats (i.e. other team projected to score something like 20 runs), I use the expected stats table to adjust for the inflated projections.

How to read the projections: The model type is at the very top. Matchups are denoted by the alternating white/green pairings, away teams are on top, home teams on bottom. The model analyzes the matchup and projects the home and away teams score (Proj Score). The difference between the home teams score and the away teams score give us what the model says the line should be (Proj Line). The “Line” column is what the Vegas line is at the time I run the model. The “Line Diff” is the difference in the projected line and the Vegas line. A positive line diff means the projected line is in the away teams favor compared to the Vegas line, negative means the projected line is in the home teams favor. The “Cover Prob” uses a normal distribution and the teams variance to project each teams probability to cover the listed Vegas line. Same thing for totals, “Proj Total” is the sum of the projected scores, “Line Total” is the listed Vegas total for the game, the difference between the two, and the probability to go over or stay under denoted by “O” and “U”.

Betting edge

If you don’t understand what you are looking at, I recommend reading my post about betting tips. The percentages show the betting Edge, which is the cover prob (from above) minus the implied probability (-110 odds implied prob is 52.4%). If a team has a 92.4% chance to win and a 52.4% implied probability (or -110 odds), the Edge is 40%. The Edge alone doesn’t mean you should blindly bet it. To quantify, >35% is great value, 20-35% is really good, 10-20% is decent, <10 is ok value, blanks are negative value. ML is moneyline.

Leave a Reply

%d bloggers like this: