[Disclaimer] There have only been a handful of games played for each team which limits the amount and quality of stats (and these are stat based projections). This makes the first month of projections WILD. Best of luck!
These are statistical projections and shouldn’t be the only thing that factors in to betting on a team, the stats only tell part of the story. Keep an eye on injuries, COVID Issues, and players returning from injury.
*Only RF projections are available until I finish fixing the rest of the code.
How it Works
All training data is pulled from every game played this season. The model is based around stats per 9 innings allowed by the starting pitcher and bullpen. So for starters, it take that particular starters stats vs the opposing teams batting stats averaged over the last 10 games for training for that particular game and all stats are adjusted for per 9 inning stats. In English that means that if starting pitcher pitches 3 innings, gives up 3 hits and 1 runs to a great batting team, then extrapolating over an entire game the starter would allow 9 hits and 3 runs against a great hitting team (Same way ERA is calculated). Same process is used for calculating the bullpen stats. Using per 9 inning stats allows to adjust easily for different number of innings pitched by the starters.
Some things to consider for MLB Projections: The batting stats are based on team stats from the last 10 games so like the NBA, they don’t account for recent injuries or return from injury. Bullpen stats are based on any all non-starting pitchers for a given game (i.e. if Kershaw comes out of the bullpen along with 2 other relievers for the Dodgers all 3 of their stats will be averaged and listed as Dodgers bullpen stats for that game). Starters who have never pitched before by default get the team average pitching stats (never pitched a game hard to tell how good/bad they will do). Starters stats used to project todays scores are averaged over the last 5 starts, batters are averaged over the last 10 games, bullpen is averaged over the last 10 games.
Stats used: Starting pitching vs Opposing team batting, bullpen vs opposing team batting, ground balls vs flyballs vs line drives (both allowed by pitcher and on average by the batting team), home vs away, day vs night games, left vs right hand pitching (starters only), and statcast pitch & batting stats (Currently broken, but close to fixed).
1. Random Forest (RF)
New Model Projections
KC/CHW is postponed.
Washington starter is undecided, projections default to Washington’s average stats among all pitchers (Starters and bullpen).
How to read the projections: The model type is at the very top. Matchups are denoted by the alternating white/green pairings, away teams are on top, home teams on bottom. The model analyzes the matchup and projects the home and away teams score (Proj Score). The difference between the home teams score and the away teams score give us what the model says the line should be (Proj Line). The “Line” column is what the Vegas line is at the time I run the model. The “Line Diff” is the difference in the projected line and the Vegas line. A positive line diff means the projected line is in the away teams favor compared to the Vegas line, negative means the projected line is in the home teams favor. The “Cover Prob” uses a normal distribution and the teams variance to project each teams probability to cover the listed Vegas line. Same thing for totals, “Proj Total” is the sum of the projected scores, “Line Total” is the listed Vegas total for the game, the difference between the two, and the probability to go over or stay under denoted by “O” and “U”.
If you don’t understand what you are looking at, I recommend reading my post about betting tips. The percentages show the betting Edge, which is the cover prob (from above) minus the implied probability (-110 odds implied prob is 52.4%). If a team has a 92.4% chance to win and a 52.4% implied probability (or -110 odds), the Edge is 40%. The Edge alone doesn’t mean you should blindly bet it. To quantify, >35% is great value, 20-35% is really good, 10-20% is decent, <10 is ok value, blanks are negative value. ML is moneyline.