What makes a Hall of Famer, a Hall of Famer?

There are around 23,000 players that have played in the MLB, and surprisingly not all make it to the Hall of Fame — it is considered a life achievement, after all. So, who in the hell makes it? Is it juiced-up ARod or diligent players like Ichiro?

In this report we use logistic regression to uncover what kind of factors go into deciding whether or not a player is inducted into the Hall of Fame.

So, who are these hall of famers?

Of the 23,000 MLB players, roughly 5,000 are eligible. Players must have played 10 seasons of Major League Baseball and have been retired for five full seasons to be considered eligible.

Of those 5,000 eligible players, only 330 have ever been voted into the Hall of Fame — about 6.6% of eligible players make it through.

And only about 1% of all players who've ever played.

So who are these 330? Let's pull them out of the pile and put them on the field — position players on the left, pitchers on the right, sized by career WAR (Wins Above Replacement).

A handful of all-time greats dominate the outer edges — Ruth, Bonds, Mays, Clemens — but the bulk of inductees cluster around a much more ordinary level of production. Career stats alone don't tell the whole story. So what does the model say matters?

All-Star selections matter — a lot.

In both of our final models, each additional All-Star selection multiplies a player's odds of induction by roughly 3.3× (pitchers) to 3.75× (position players), holding everything else constant. Consistent All-Star appearances are the single strongest predictor we found — a feedback loop where accolades during a career beget accolades after it.

For pitchers: wins still rule.

Each additional career win raises a pitcher's odds of induction by about 18% (OR 1.183, 95% CI [1.081, 1.388]). Counting stats are fading in modern analysis, but the voters are still counting.

Position player model — odds ratios.

The largest effects in the position player model. An odds ratio above 1 increases induction chances; the bars show 95% confidence intervals on a log scale.

The Veterans Committee effect is enormous but unstable — the committee considers very few players and inducts almost all of them, which creates near-perfect separation in the model.

Pitcher model — odds ratios.

Same view for the pitcher model. Wins, All-Star selections, and the Veterans Committee dominate.

Who the model disagrees with.

The models aren't perfect — precision and recall both sit below 50% on our held-out test set. A few cases the model gets "wrong" are revealing:

Adrian Beltré and Yadier Molina — predicted unlikely, but near-locks to voters.
Joe Torre — inducted for managing, not his playing stats.
Curt Schilling — strong stats, but character concerns kept him out.

The model sees counting stats and awards. It can't see reputation, narrative, or a voter's grudge.

The End

Try our interactive prediction app: klobby19.shinyapps.io/explorable.

Built from the STAT 423 final project (Kobe Sarausad, Dante Renteria, Andy, Kai). Source: github.com/danter2000/STAT423-Project.