Not sure yet? Download the free sample first — all purchases are final once a file is accessed.
Free Sample

Try the data before you buy.

A 50 MB sample with one slice of every major dataset family. Open it, query it, see how it feels. If you like it, the full library is one click away.

Download free sample (50 MB)

No email required · No signup · Direct .zip · Parquet + CSV inside

What's in the sample

  • 2025 game logs (enriched) — every 2025 baseball game with rolling win%, days rest, run differential, series number, B2B flag. 2,400+ rows × 33 cols.
  • 2025 weather snapshot — game-day temperature, wind, humidity, pressure, dewpoint for ~2,400 games.
  • Lahman People table — full player biographical master, 24,270 players including DOB, birthplace, debut, college.
  • Statcast pitch-by-pitch — single 2026 month — ~50,000 pitches × 118 cols, every column populated, the same schema as the full Pitch Lab.
  • Data dictionary + README — exactly what you'd ship in any of the paid packages.

Format: parquet (primary) + CSV (for non-coders). Drop the parquet into pandas / DuckDB / Spark, or open the CSV in Excel.

What you're missing

  • 16 years of game logs — the sample shows 2025 only. The Baseball Vault covers 1898 → 2025 (Retrosheet) and 2010 → 2026 (enriched).
  • 76,000+ odds rows — none in the sample. The Oddsmaker has every line back to 2010.
  • The full Statcast 12 seasons — sample is one month. The Pitch Lab is 7.7M pitches × 118 cols.
  • 637,696 head-to-head career matchups — pitcher × batter career wOBA / xwOBA / K rate. Pitch Lab and up.
  • Pre-enriched 556-col pitcher stats & 483-col batter stats — YoY changes, luck deltas, career cumulative. Analyst's Playbook.
  • Pre-loaded SQLite database with hundreds of tables and a 50-query SQL cookbook. Complete Diamond only.

See all 7 packages →

How buyers use it

Three things people do with the data within a week of downloading.

📈

Backtest a betting model

Join odds_master to game_logs_enriched and weather, walk forward through 2010–2025, get a real CLV-aware ROI curve.

🧪

Train a pitch-outcome model

Load all 12 Statcast seasons into pandas / DuckDB, build a per-pitch features table, fit LightGBM in 50 lines.

Build a fantasy ranking

Use the 556-col pitcher / 483-col batter enriched files to score players on YoY change, luck deltas, and career arc.

Ready when you are

Updated annually each November — the latest version is always available at current pricing. Keep your receipt email safe — it contains your download link.

See all packages

Or download the free sample first.