MLB Line Projections – 20 May 21

These are statistical projections and shouldn’t be the only thing that factors in to betting on a team, the stats only tell part of the story. Keep an eye on injuries, COVID Issues, and players returning from injury.

Like what you see? Please subscribe or follow me on Twitter (@AnalyticsB2) for the latest news and post info. Want to support the statistical data or have suggestions for improvement, feel free to send me an email (B2SportsStats@gmail.com) or you can donate via the website, Venmo (@B2stats) or Cashapp ($B2stats).

Next Project: Projections for MLB DFS. Look for periodic updates via twitter. I’m shooting for a mid May release.


All double headers are accounted for being 7 innings.

I will post my top model that I use daily (RF), along with the top 2 performing models.

Summary of Projections

Model Record

Model Rank

Consensus Record

Consensus Profitability

Model Consensus

Sharp Report

ML – Some Sharp money has come in on the Rangers, Orioles, Giants, Cubs, & Angels (game 1).

O/U – Some Sharp money has come in on the Yankees/Rangers Under 8.5, Nationals/Cubs Under 10.5, Marlins/Phillies Over 8, & Pirates/Braves Over 8.5.

Team Variance

Updated every Sunday.

This image has an empty alt attribute; its file name is variance-3.png

What this chart is showing is each teams Variance & Standard Deviation (Std Dev). Variance and Std Dev are calculated from the season stats (need a larger sample size than 8 games). The list is ordered from lowest team variance to most team variance. The variance is how wide spread the data is, a team that scores between 1 and 2 runs every night will have very low variance, whereas a team that scores anywhere from 0 to 10 runs on a given night will have very high variance. The Std Dev is the square root of the variance and is a good measure for how consistent a team is… NOT how good or bad a team is, but how consistent they are. Std Dev shows the amount a teams score typically deviates from the average on a given night. The Mets score (typically) will deviate about 2.15 runs from there average on a given night, where the Reds score (typically) will deviate 4.10 runs from their average on a given night. Obviously the lower the Std Dev the easier it is for my models to project the score and provide higher probabilities.

Starting Pitcher Expected Regression

Note: Not all pitchers are in the data source used for xERA resulting in an #Value! error.

This chart shows each of todays starting pitchers ERA along with expected and advanced stats. Basically, a Pitcher who has a higher ERA with lower FIP and Expected ERA (xERA), indicates the pitcher has performed better than what the stats indicate. Conversely, a pitcher with a low ERA but higher FIP & xERA indicate he has benefitted from a few “Lucky Bounces,” but has not been as good as his ERA indicates. ERA+ is a comparison of how their ERA compares to other pitchers in the league adjusted for for ballpark related factors.

FIP (Fielding Independent Pitching) is a way to project the pitchers ERA taking into account only what the pitcher can control. This is the most common way for stat nerds like me to see if a Pitcher is pitching better than or worse than traditional stats suggest. If the FIP is lower than the ERA, I expect the Pitcher to pitch better than past performances, if the FIP is higher than the ERA, I expect the pitcher to be worse than past performances. ERA Value is ERA minus FIP, Positive indicates FIP is better that ERA, negative means FIP is worse than ERA.

In short, ERA+ essentially adjusts for the ballpark and compares the pitcher to the league average. ERA+ of 100 indicates the league average after adjusting for ballpark, above 100 indicates better than average, less than 100 indicates worse than average. To make sense of what’s displayed, the ERA+ Percentage shows how much better (or worse if negative number) the starter’s ERA is compared to the league average ERA, assuming all ballparks are the exact same. The Percentages may seem odd at first when you see a guy is 200% better or worse than the league average, so if you need some additional help understanding Here is a short, quick, easy to understand explanation. This is a good way to factor in something like a Rockies Pitcher who typically pitches in Home Run City (Denver, Colorado), but today is on the road in a more average ballpark. From ERA+ we can also get expected win percentage, which is good to look for expected regression too. for example, If a pitcher that is 4-0, but has an expected win % of 50%, they should probably be 2-2, but may have benefitted from a lot of run support or a bit of luck.

Expected ERA (xERA) is based on expected weighted on base percentage (xwOBA) and is converted to ERA form. It is formulated using exit velocity, launch angle and, on certain types of batted balls, Sprint Speed. It accounts for things like how hard batters are hitting a certain pitcher. The idea is if a pitcher gets hit incredibly hard but right to a fielder for an entire game, he is a bit lucky and his stats will look better than they should. Next game I would expect some of those hard hit balls to find gaps or go over the fence and the pitcher to perform worse. Conversely, a pitcher who gives up a lot of hits in a game from weak contact like bloop shots, dribblers down the line, etc, the xERA will be lower than the actual EAR and I would expect next game the batters don’t get as lucky on the weak contact finding holes to fall into.

Model Projections

My Model Choice:

Random Forest

Padres & Yankees are having COVID issues with some Key Players, that wont show up fully in the stats.

Undecided Pitchers default to league average stats.

For Pitchers with inflated stats (i.e. other team projected to score something like 20 runs), I use the expected stats table to adjust for the inflated projections.

No line up on the ARI/LAD game.

How to read the projections: The model type is at the very top. Matchups are denoted by the alternating white/green pairings, away teams are on top, home teams on bottom. The model analyzes the matchup and projects the home and away teams score (Proj Score). The difference between the home teams score and the away teams score give us what the model says the line should be (Proj Line). The “Line” column is what the Vegas line is at the time I run the model. The “Line Diff” is the difference in the projected line and the Vegas line. A positive line diff means the projected line is in the away teams favor compared to the Vegas line, negative means the projected line is in the home teams favor. The “Cover Prob” uses a normal distribution and the teams variance to project each teams probability to cover the listed Vegas line. Same thing for totals, “Proj Total” is the sum of the projected scores, “Line Total” is the listed Vegas total for the game, the difference between the two, and the probability to go over or stay under denoted by “O” and “U”.

Betting Edge

If you don’t understand what you are looking at, I recommend reading my post about betting tips. The percentages show the betting Edge, which is the cover prob (from above) minus the implied probability (-110 odds implied prob is 52.4%). If a team has a 92.4% chance to win and a 52.4% implied probability (or -110 odds), the Edge is 40%. The Edge alone doesn’t mean you should blindly bet it. To quantify, >35% is great value, 20-35% is really good, 10-20% is decent, <10 is ok value, blanks are negative value. ML is moneyline.

Top Performing Models:

1. Neural Net

Padres & Yankees are having COVID issues with some Key Players, that wont show up fully in the stats.

Undecided Pitchers default to league average stats.

For Pitchers with inflated stats (i.e. other team projected to score something like 20 runs), I use the expected stats table to adjust for the inflated projections.

No line up on the ARI/LAD game.

How to read the projections: The model type is at the very top. Matchups are denoted by the alternating white/green pairings, away teams are on top, home teams on bottom. The model analyzes the matchup and projects the home and away teams score (Proj Score). The difference between the home teams score and the away teams score give us what the model says the line should be (Proj Line). The “Line” column is what the Vegas line is at the time I run the model. The “Line Diff” is the difference in the projected line and the Vegas line. A positive line diff means the projected line is in the away teams favor compared to the Vegas line, negative means the projected line is in the home teams favor. The “Cover Prob” uses a normal distribution and the teams variance to project each teams probability to cover the listed Vegas line. Same thing for totals, “Proj Total” is the sum of the projected scores, “Line Total” is the listed Vegas total for the game, the difference between the two, and the probability to go over or stay under denoted by “O” and “U”.

Betting edge

If you don’t understand what you are looking at, I recommend reading my post about betting tips. The percentages show the betting Edge, which is the cover prob (from above) minus the implied probability (-110 odds implied prob is 52.4%). If a team has a 92.4% chance to win and a 52.4% implied probability (or -110 odds), the Edge is 40%. The Edge alone doesn’t mean you should blindly bet it. To quantify, >35% is great value, 20-35% is really good, 10-20% is decent, <10 is ok value, blanks are negative value. ML is moneyline.

2. Ada Boost

Padres & Yankees are having COVID issues with some Key Players, that wont show up fully in the stats.

Undecided Pitchers default to league average stats.

For Pitchers with inflated stats (i.e. other team projected to score something like 20 runs), I use the expected stats table to adjust for the inflated projections.

No line up on the ARI/LAD game.

How to read the projections: The model type is at the very top. Matchups are denoted by the alternating white/green pairings, away teams are on top, home teams on bottom. The model analyzes the matchup and projects the home and away teams score (Proj Score). The difference between the home teams score and the away teams score give us what the model says the line should be (Proj Line). The “Line” column is what the Vegas line is at the time I run the model. The “Line Diff” is the difference in the projected line and the Vegas line. A positive line diff means the projected line is in the away teams favor compared to the Vegas line, negative means the projected line is in the home teams favor. The “Cover Prob” uses a normal distribution and the teams variance to project each teams probability to cover the listed Vegas line. Same thing for totals, “Proj Total” is the sum of the projected scores, “Line Total” is the listed Vegas total for the game, the difference between the two, and the probability to go over or stay under denoted by “O” and “U”.

Betting edge

If you don’t understand what you are looking at, I recommend reading my post about betting tips. The percentages show the betting Edge, which is the cover prob (from above) minus the implied probability (-110 odds implied prob is 52.4%). If a team has a 92.4% chance to win and a 52.4% implied probability (or -110 odds), the Edge is 40%. The Edge alone doesn’t mean you should blindly bet it. To quantify, >35% is great value, 20-35% is really good, 10-20% is decent, <10 is ok value, blanks are negative value. ML is moneyline.

Leave a Reply

%d bloggers like this: