⚾ Currently featuring baseball — 2010 through 2026 season. More sports added in future releases.
PACKAGE 1

The Baseball Vault

"The complete historical record — games, players, stats, all in one place."

CSV Excel

For: historians, fantasy players, casual researchers, students, writers.

$59
Buy — $59

All sales final. Try the free sample first.

What's inside

  • Historical baseball archive — 27 tables of records dating back to 1871 (Batting, Pitching, Fielding, Teams, Salaries, HOF, Awards, All-Star, postseason, Managers, Parks, People, College, Franchises, etc.)
  • Retrosheet game logs — every MLB game 1898–2025 (224,877 games × 44 cols)
  • Game logs enriched — 39,782 games (2010–2026) with days rest, rolling 10-game win%, series numbers, back-to-back flags, run differentials
  • Standings master — daily 2010–2026 (32,847 rows)
  • Attendance master — every team every season 2010–2025
  • Postseason batting / pitching / fielding / series
  • Manager career stats — win%, career_games, career_wins (4,410 manager-seasons)
  • Franchise history — Teams ⨝ Franchises (3,614 team-seasons)
  • Salary history — 1985–2016 with rank and % of payroll
  • Hall of Fame + awards — MVP, Cy Young, ROY, Gold Glove, Silver Slugger, All-Star (3,430 players × 93 cols)
  • Player bios — 24,270 players: birthplace, debut year, college

Why it stands out

  • 27 historical tables in one clean download — no hunting across multiple sites
  • Enriched game logs with features no free source provides (rolling win%, days rest, series tracking)
  • Excel-friendly for non-coders — open and start analyzing immediately
  • Updated every November with the new season

Extras included

  • Data dictionary (every file + row counts)
  • README with suggested use cases
PACKAGE 2

The Oddsmaker

"Every betting line from 2010 to today — one clean file."

CSV Excel

For: sports bettors, line shoppers, odds historians, betting model builders getting started.

$89
Buy — $89

All sales final. Try the free sample first.

What's inside

  • Odds master — 76,000+ game lines, all sources merged, 2010–2026
  • Unified odds — cleaned moneyline history
  • SBR odds 2021–2025 + SBR totals 2022–2026
  • Line movement — opening to closing line shifts
  • Odds API historical — 2,466,967 rows of timestamped lines
  • Multiple book/source pulls (catchup, current, complete_odds_2016_2026, historical_odds, matched_odds)
  • Umpire tendencies — zone tendencies, called strike rate, K-boost vs league average, all 135 HP umpires, 2016–2026
  • Game ⨝ ump roster (105,680 game-ump rows)

Why it stands out

  • Odds from 6+ sources merged and deduplicated into one master file
  • Line movement data showing where the sharp money went
  • Umpire zone tendencies included — a massive edge most bettors don't factor in
  • Deep historical coverage — every line 2010–2026 for backtesting any betting system

Extras included

  • Data dictionary
  • Umpire tendency lookup guide
  • Line movement interpretation guide
PACKAGE 3

The Sharp's Edge

"Everything a sharp bettor needs to build a winning model."

CSV Excel Parquet

For: serious bettors, quantitative analysts, model builders.

$149
Buy — $149

All sales final. Try the free sample first.

What's inside

  • Everything in The Oddsmaker — all odds + umpire tendencies
  • Weather master — 139,783 game-day weather readings (temp, wind, humidity, pressure, dewpoint)
  • Game logs enriched — 33 cols including rolling win%, run diff, days rest
  • Park factors — by year and venue (510 rows × 18 cols)
  • Standings master
  • Venues — stadium details (lat/lon, elevation, attendance, duration)
  • Game logs raw (18 cols)

Why it stands out

  • Weather + odds + line movement + park factors + game context in one package
  • Parquet format for fast loading in Python/R — handle millions of rows instantly
  • The exact data inputs a professional betting model needs
  • Updated annually

Extras included

  • Data dictionary
  • Python quickstart notebook (load and join all files in 10 lines)
  • Weather variable glossary
PACKAGE 4

The Analyst's Playbook

"Every advanced stat for every player, enriched and ready."

CSV Excel Parquet

For: fantasy players, sabermetricians, front office analysts, advanced stats enthusiasts.

$119
Buy — $119

All sales final. Try the free sample first.

What's inside

  • Pitcher stats enriched — 12,803 rows × 556 cols (QS rate, K-BB%, FIP-ERA delta, xERA-ERA delta, SIERA-ERA delta, BABIP vs career, ERA/FIP/K% YoY, career_ip)
  • Batter stats enriched — 22,905 rows × 483 cols (power_score, xwOBA-wOBA delta, xBA-BA delta, wRC+ YoY, plate discipline, career_pa, BABIP vs career)
  • WAR enriched (pitcher 57,364 / batter 125,317) — career_war_cumulative, peak, per_162, YoY, seasons_played
  • FIP / xFIP / SIERA full history
  • Pitcher Stuff+ ratings (2017–2025)
  • Pitcher velocity trends — within-season velocity changes
  • Team batting / pitching / fielding (FanGraphs team-level)
  • Bullpen usage — workload patterns 2015–2025
  • Sprint speed 2015–2025
  • Outs above average (Statcast OAA)
  • Running splits 2015–2025
  • Lineups master 2010–2026
  • TeamRankings stats
  • Statcast batter & pitcher EV/barrels, expected stats, percentiles, pitch arsenal
  • Catcher framing run values + pop time
  • SP name ⇄ MLBAM ID crosswalk

Why it stands out

  • Pre-computed derived features nobody wants to build themselves (YoY changes, career cumulative, luck deltas)
  • 556 pitcher cols + 483 batter cols — the most complete player stat file available anywhere
  • Lineup data going back to 2010 (retrosheet-derived 2010–2015, MLB API 2016–2026)
  • Updated annually

Extras included

  • Data dictionary (all 556 pitcher cols + 483 batter cols explained)
  • Suggested features for fantasy scoring models
  • Python quickstart notebook
PACKAGE 5

The Pitch Lab

"12 years of every pitch ever thrown. Compute anything."

Parquet JSON

For: pitch modeling researchers, Statcast enthusiasts, ML engineers, advanced fantasy analysts.

$169
Buy — $169

All sales final. Try the free sample first.

What's inside

  • Statcast pitch-by-pitch — all 12 seasons 2015–2026, every single pitch (~7.7M rows × 118 cols)
  • Pitcher arsenal complete — pitch mix, velocity, spin, whiff rate, called strike rate, zone rate (2015–2026)
  • Head-to-head career — 637,696 unique pitcher × batter matchups (BA, wOBA, xwOBA, K rate, BB rate)
  • Head-to-head by season — 953,922 rows broken out per year
  • Batter / pitcher platoon splits (vs LHP/RHP, vs LHB/RHB)
  • Day/night splits per player
  • Home/away splits per player
  • Pitcher velocity fatigue — 5 pitch-count buckets (1-25, 26-50, 51-75, 76-100, 100+) showing velocity drop, spin drop, whiff rate within game
  • SP first inning ERA — starter-only first-inning K/BB/whiff (2015–2026)
  • Statcast batter & pitcher EV/barrels, expected stats, percentiles
  • Catcher framing + pop time

Why it stands out

  • 12 seasons of raw pitch-by-pitch is ~1 billion data points — impossible to compile yourself
  • Head-to-head matchup data for 637,696 career pitcher-batter combinations — unique to this package
  • Velocity fatigue curves showing exactly when pitchers lose stuff within a game
  • Parquet + JSON for maximum compatibility with ML pipelines
  • Updated annually

Extras included

  • Data dictionary
  • Python quickstart notebook (load a season, filter by pitcher, plot velocity fatigue in 15 lines)
  • Statcast column glossary
PACKAGE 6

The Raw Feed

"The complete sports data warehouse. Every table. Two formats."

Parquet JSON

For: data engineers, ML teams, quantitative researchers, hedge funds, serious betting operations.

$299
Buy — $299

All sales final. Try the free sample first.

What's inside

  • Everything in packages 1 through 5
  • Every parquet file in our source library — 130+ tables
  • Full historical archive — 27 tables, records back to 1871
  • Full pitch-by-pitch tracking — 12 seasons
  • All odds sources
  • All enriched and computed files
  • All splits, matchups, fatigue curves, umpire tendencies

Why it stands out

  • One purchase = the entire dataset — nothing held back
  • Parquet + JSON = works with any modern data stack (pandas, Spark, DuckDB, BigQuery, Snowflake)
  • 130+ pre-cleaned tables ready to join — months of data engineering work already done
  • Updated annually — every table refreshed

Extras included

  • Full data dictionary for all 130+ files
  • Python quickstart notebook

Update policy

Every package is updated annually after the season ends (November). The latest version is always available at current pricing. Keep your receipt email safe — it contains your download link.

Most data sellers charge per year or per subscription. We don't. Each version is a one-time purchase. New annual versions are released each November and are separate purchases at current pricing.