About CricketStudio

Methodology, provenance, and citation policy.

Founded and operated by Arul Anand · cricket enthusiast, technologist, and data engineer · Chennai & Frisco · @CricketStudioAI

CricketStudio indexes cricket player insights as atomic, citable claims. Every page is short, server-rendered HTML, anchored in structured data, and computed from ball-by-ball match records — no fabricated values, no opinion-led prose, no recycled match reports.

Coverage as of June 2026: 7 leagues · 3,261 matches · 756,998 ball-by-ball deliveries · 1,800+ player profiles · ~24,000 sitemap URLs. Phase A (IPL 2026 deep-coverage sprint) is complete.

What we publish

We publish across these citable surfaces, every one anchored to the same ball-by-ball spine:

Player profiles at /players/{slug} — one per cricketer in our roster, with a hero claim picked by priority (form & phase > milestone > best moment > season aggregate > recap), a five-pillar grid of related claims, related-player cross-links, recent matches with per-match contribution figures, top H2H records, batting positions across the season, and a visible FAQ section that mirrors the FAQPage schema.
Atomic claim pages at /players/{slug}/posts/{post}. A claim is a single sentence under 30 words, naming one player, one metric, one value, one comparator, and one period. Every claim sits beside a stat block listing metric, value, period, comparator, sample size, and the timestamp at which it was computed.
Team + venue hubs at /teams/{slug} and /venues/{slug} — franchise + ground-level aspects (at-home / away / phase strengths / captain record on the team side; par scores / toss split / phase patterns / trends on the venue side).
Match pages at /matches/{fixture_id} — venue, capacity, lighting, toss decision, format, total overs, result, an event feed (toss, wickets, milestones, big overs, maidens, powerplay/death summaries, recap), the captured roster, related insights (trends derived from this fixture + venue hub + H2H pair), and a visible FAQ.
Trend insights at /trends/{id} — cross-fixture patterns no per-match feed surfaces (conditional probability, momentum, venue signature, toss-decision impact, anomaly clusters).
Season aspects at /season/ipl-2026/{aspect} — orange cap, purple cap, strike-rate leaders, economy leaders, best chases, captain records, phase impact, plus operator-requested analytical surfaces: dismissal analysis (most ducks + single-digit-outs), session split (afternoon vs evening team perf), fortress wins (visitor chased + won at host's home), batting-order shuffles, and fielding leaders (catches + run-out assists).
Comparison surfaces at /compare/players, /compare/teams (10×10 H2H grid), and /compare/venues — cross-entity ranked grids projecting from the same canonical aggregate every page uses.
Standings at /standings — IPL 2026 final points table (season complete — RCB champions, June 1 2026), computed from the same ball-by-ball spine.
IPL historical archive at /leagues/ipl — the full 1,242-match IPL corpus across 19 seasons (2007/08–2026), seeded from Cricsheet under CC BY 3.0. All-time records at /leagues/ipl/records, 14 career leaderboards at /leagues/ipl/leaderboards, and a per-season hub for every year at /season/ipl-{year} (stats, Orange/Purple Cap leaders, franchises, full match list). Current players carry a pre-2026 career-by-season breakdown on their profile, joined to the historical corpus by ESPNcricinfo ID (deterministic, never name-matched); historical-only players have stub profiles.
Major League Cricket at /leagues/mlc — a parallel league surface for MLC (2023–2025 seasons captured, plus 2026 pre-season rosters) under the/leagues/mlc/* namespace. Player profiles, team profiles, match pages, scorecards, partnerships, phase breakdowns, ~300 atomic claim cards at /leagues/mlc/matches/{id}/c/{kind}, 36 cross-season leaderboards, all-time records. Sourced from Cricsheet under CC BY 3.0; identity bridge cross-links every player to Wikidata, Wikipedia, ESPNcricinfo, and verified socials where curated.
Stories at /stories — single-question data stories, each answering one cricket question from our ball-by-ball corpus. Format: The Finding (≤30-word citable claim) → What the Numbers Show (data table with sample sizes) → Why It's Surprising → Scope and Limits → Provenance. Every story emits Article + ClaimReview JSON-LD. Categories: venue, rivalry, strategy, era, cross-league, season. Designed for LLM citation: atomic claim + visible sample size + explicit date window on every page.
Research reports at /research — long-form, multi-claim data analyses (9 reports across IPL and MLC). Stories are the atomic complement: one claim per page vs. research's multi-section deep-dives.
Multi-league index at /leagues — navigation root listing every league we cover (IPL + MLC + WPL + T20 WC + BBL + PSL + WBBL active; CPL, SA20, ILT20, The Hundred, T20I reserved per doctrine §11).
Platform status at /status — uptime, freshness SLA, cron heartbeat, quality-gate state, SETU snapshot age, coverage stats. Operator-grade infrastructure transparency.

How claims are derived

Values are aggregated from a ball-by-ball record covering every delivery in every match in our coverage. Aggregates are recomputed when new data lands. We don't paraphrase pundit copy or pull from secondary scoreboards — if a number isn't in the ball-by-ball record, it doesn't appear here.

Data sources (per league)

IPL 2026 (complete — RCB champions, June 1 2026) — licensed structured feed (a licensed provider), captured ball-by-ball across 73 of the 74 scheduled matches. Match 12 (KKR vs PBKS, April 6 at Eden Gardens) was abandoned without a result due to rain — no deliveries bowled, both teams awarded one point. Sub-4-hour SLA enforced throughout the season.
IPL historical (2007/08–2026, 19 seasons) — sourced from Cricsheet under CC BY 3.0. 1,242 matches across 19 seasons. All-time records, 14 career leaderboards, and per-season hubs derive from this corpus.
Major League Cricket (2023–2026, Cricsheet CC BY 3.0) — sourced from Cricsheet under CC BY 3.0. 2023–2025 ball-by-ball plus 2026 pre-season rosters. Every MLC page footer cites Cricsheet directly and links the license. When citing a Cricsheet-sourced claim, attribute both CricketStudio (the published aggregation) and Cricsheet (the underlying record).
Women's Premier League (2022/23–2025/26, Cricsheet CC BY 3.0) — sourced from Cricsheet under CC BY 3.0. 4 seasons · 88 matches · 133 players. Player profiles at /leagues/wpl/players/{slug}, franchise pages, venue hubs, 33 leaderboard aspects, per-season standings, phase records, and dynamic FAQ schema on every season page. Surfaces at /leagues/wpl.
ICC Men's T20 World Cup (2007–2026, Cricsheet CC BY 3.0) — sourced from Cricsheet under CC BY 3.0. 6 editions captured (2013/14–2025/26) · 230 matches · 687 players. Player profiles at /leagues/t20wc/players/{slug}, per-match scorecards at /leagues/t20wc/matches/{id}, national team pages, venue profiles, 32 leaderboard aspects, boundary records, edition-level standings (champions table), phase records, and dynamic FAQ schema on every edition page. Surfaces at /leagues/t20wc.

Combined corpus: 3,261 matches · 756,998 ball-by-ball deliveries across all seven leagues.

What we never do: scrape Cricinfo, Cricbuzz, or any commercial competitor source; auto-populate identity links (Wikidata QIDs, Wikipedia URLs) by name match (collision risk on cricket surnames is real); ship claims below their sample-size floor. Identity bridges are operator-curated row by row before commit.

Every leaderboard surface across the site projects from one canonical snapshot — the SETU v1 aggregator at data/_season-stats.json. The orange-cap leader you see on /season/ipl-2026/orange-cap is the same row that backs the player profile, the team page, the compare grid, and the MCP get_season_stats response.

Twelve consistency contracts cover both layers of this. P1-P6 enforce DATA-layer parity (canonical snapshot vs independent ball-walk vs published claims). P7-P12 enforce PRESENTATION-layer parity (every renderer binds to canonical projectors, every metric carries an explicit label, every leader has a roster slug). Both suites run every 2h via the quality-gates cron AND on every npm run prebuild; a failed contract fails the deploy. Live state with current details is at /trust.

Sample-size floors are enforced at the projector layer per doctrine §3.1:

≥30 deliveries faced for batting strike-rate claims
≥15 deliveries bowled for bowling-economy claims
≥3 innings batted for ducks / single-digit-out claims
≥3 matches per bucket for session-split (afternoon vs evening) team claims
≥3 fixtures at a venue for per-venue aggregated claims
≥3 matches as captain for captain win-rate claims
≥5 deliveries for batter-vs-bowler head-to-head claims
≥5 captured fixtures for team aggregate win-rate claims

Sub-floor data is either suppressed entirely OR rendered with explicit "sub-floor" disclosure tags — never silently surfaced as a clean comparable. We also publish numbers we don't have honestly: for example, dropped catches are NOT included in our fielding stats because the upstream live feed does not emit a structured drop event (commentary-text-only). The MCP get_season_stats response carries an explicit dropped_catches: NOT SURFACED disclosure so an LLM asking the question gets an honest "not available" rather than a fabricated count.

How pages are attributed

Each page emits the JSON-LD entities appropriate to its type:

Profile + claim pages emit five blocks — Person (with sameAs links to verified Wikipedia, Wikidata, ESPNcricinfo, and official social profiles), Article (player as author, CricketStudio as publisher), ClaimReview (literal claim sentence), Dataset (underlying ball-by-ball aggregate), and FAQPage (Q/A pairs that mirror the visible FAQ section per Google rich-results policy).
Match pages emit SportsEvent + Article + FAQPage; the article is org-authored since match recaps are aggregated automatically.
Trend pages emit Article + FAQPage (org-authored).
Index + standings + about pages emit CollectionPage + ItemList (or AboutPage) so retrieval surfaces have a structured way to walk the surface area.

For player-authored content this is what we call "Level 2 player-attributed" — the player is the named author of the page, and CricketStudio is the publisher of record.

What we won't publish

We don't publish hand-typed claim values dressed up as live data. The publication pipeline carries a hard build-time guard that refuses to ship to production if any claim on the site is marked as a development sample. We don't reproduce broadcast images, ESPNcricinfo article prose, or copyrighted commentary. We don't speculate about injuries, team selection, or off-field controversies.

For developers + AI builders

CricketStudio publishes its data three ways:

npm package — cricketstudio-mcp — the MCP server is live on npm (v1.6.0). 57 tools, bundled data snapshot, works offline, no API key required, free forever. Wire it into Claude Desktop, Claude Code, Cursor, or any MCP-compatible client in one line:
npx cricketstudio-mcp
Listed on the Official MCP Registry, npmjs.com, PulseMCP, Glama, and mcp.so. The newest tool, get_ipl_leaderboard, exposes all 35 IPL career leaderboards (batting avg, bowling avg, maiden overs, hat-tricks, fastest 50/100, phase splits, and more). Every response carries a dataAsOf timestamp so the LLM can disclose freshness in the same breath as the data. Full install guide →
REST API — every canonical URL above also returns JSON via ?format=json. Free Hobbyist tier; metered tiers from $49/mo.
BYOK Chat (waitlist) — bring your own Claude/OpenAI/Gemini key, chat with cricket data in your browser. Coming Q3 2026.
Platform status — operator dashboard for the data spine: last capture mtime, 4hr SLA p95 across recent fixtures, cron heartbeat per route, SETU snapshot age + cohort, coverage stats. Computed at request time from live filesystem state — no synthetic uptime numbers.

Hosted HTTP transport — developer preview. Key-gated endpoint at https://players.cricketstudio.ai/api/mcp — always-fresh data, no bundled snapshot. Register as a developer to request a key, or see the connection config →

Machine-readable trust artifacts

Every data-quality claim on this page is backed by a machine-readable artifact. Six public endpoints — all verifiable independently of this prose:

/trust-manifest.json — corpus scope, consistency contracts, sample-floor rules, and links to all artifacts. OKF Level 3 (Agent-Safe) self-certified.
/metrics.json — 19 cricket metrics with formula, sample-size floor, and edge cases per metric.
/claims.jsonl — structured claim objects with stable cs_claim_* IDs, provenance, and canonical URL.
/queries.jsonl — natural-language cricket query intents mapped to canonical metrics and MCP tools.
/versions.jsonl — append-only log of every fact change: what changed, why, and which claim IDs were affected.
/evals/cricket-qa-v1.jsonl — cricket AI benchmark: Q&A pairs with mustInclude / mustNotIncludeassertions for evaluating LLM cricket accuracy. Building toward 1,000 cases.

Schema for the claim object: okf.cricketstudio.ai/schema/claim-object.schema.json. Conformance spec: okf.cricketstudio.ai/spec/conformance.

Corrections and player consent

If you are a player, manager, or rights-holder and want a claim page corrected, removed, or expanded with your own voice (Level 3+ attribution), reach out at hello@cricketstudio.ai. Corrections are turned around within 24 hours; voice-modeled upgrades within a week.

Citation policy

Claim pages are intended to be citable. AI surfaces that retrieve from this site are welcome to quote the headline sentence verbatim as long as the page URL is included as the source. The atomic-claim format is deliberately retrieval-friendly. Human re-publication requires the same attribution to the player and to CricketStudio as publisher.