NCAA football team draft rates

When I see a straight-line fit, I always wonder if it falls into the ask-for-a-line category. In particular, if the trend is really significant and really linear.

It always helps if the raw data is available and the original Reddit post does eventually point to the data [CSV] though I had to go through a chatbot to see it. The raw data nicely has 44,000+ players graduating high school between 2005 and 2025. For each player there are several fields, including a composite rating, what college they played for, and whether they were drafted by the NFL or not. The general idea of the graph is to show how draft likelihood depends on the high school rating and school. Presumably schools that are far above or below the overall trend line do better or worse at preparing players to be drafted.

Player level trends

Since the raw data is at the player level instead of the team level, I’ll start there to provide a sense of the data. This chart is constrained to teams in the SEC to keep the number of dots manageable for one chart (5000 out of 44,000 here).

The composite ratings are on the x axis and the coloring is the star-system rating. I’m surprised the star rating and the composite ratings align so well. I was under the impression that the composite ratings were relative within year and the stars were more absolute, but I could be wrong or maybe there isn’t that much year-to-year variability in player quality.

In any case, we learn that the ratings mostly range from 0.7 to 1.0, and we see spikes at regular intervals at the low-end. I think that means those ratings are coarser; for instance, perhaps each year’s top 100 players are ranked precisely and others were assigned to tiers. We can make some rough assessment of the draft rates already, and we can do better if we smooth the dot jitter a bit.

Now it’s easier to sense that the proportion drafted is correlated with the rating. For instance, over half of the black dots (5-star ratings) are in the top, drafted=true, section, but far less than half for the other groups.

For all my charts (except the next one), I only include players that graduated high school in 2021 or before, though the data set goes through 2025. The original chart goes through 2022, which does capture some players who left school after only 3 years but also counts all 4-year players as not-drafted. (The raw data set doesn’t indicate whether players are still in school or not.) To confirm that, here’s a plot of the proportion of 4-star players drafted by year of high school graduation.

Even the 2021, rate may be lessened by players still in school, but certainly 2022 is affected, so I think it’s right to exclude 2022.

Even with the raw, unsummarized data, we can remap the drafted state to a 0 or 1 numeric value and fit a trend curve against rating for all 44,000 players.

This suggests that, at least at the player level, the trend is not a straight line, even within the original 0.8 to 1.0 range. The slope changes at around 0.85 and then more sharply just past 0.95.

Team level comparisons

Here’s a view of drafted proportion versus star ratings by team. Teams are ordered by descending overall draft percentage, with only the top 20 shown.

The black lines represent the overall averages (and so they’re at the same positions in each panel). Some of the 5-star proportions have very wide confidence intervals since the counts are very small. For instance, South Carolina only had four 5-star players during this period, but all four were drafted, it has 100% draft rate with a wide confidence interval.

Finally, here’s a close reproduction of the original but using a spline smoother instead of a straight line fit. Instead of white, I used the school’s secondary color for the text, but admittedly it doesn’t always work well (gray OSU text over scarlet circle). I also added Notre Dame, which was inadvertently left off of the original.

At this aggregation level, the curve and the line are not that different, so my only complaint about a straight line fit is the way it suggests going below 0 for values less than 0.8. I also tried school logos as marks.

The school colors and logos weren’t in the original data set, but I was able to employ ChatGPT to collect them for me, which worked well enough for the larger schools.

Fediverse Reactions

Leave a Reply

Discover more from Raw Data Studies

Subscribe now to keep reading and get access to the full archive.

Continue reading