CricketStudio · About

About CricketStudio

Methodology, provenance, and citation policy.

Founded and operated by Arul Anand · cricket enthusiast, technologist, and data engineer · Chennai & Frisco · @CricketStudioAI

CricketStudio indexes cricket player insights as atomic, citable claims. Every page is short, server-rendered HTML, anchored in structured data, and computed from ball-by-ball match records — no fabricated values, no opinion-led prose, no recycled match reports.

Coverage as of June 2026: 3 leagues · 1,317 matches · 312,309 ball-by-ball deliveries · 1,200+ player profiles · ~24,000 sitemap URLs. Phase A (IPL 2026 deep-coverage sprint) is complete.

What we publish

We publish across these citable surfaces, every one anchored to the same ball-by-ball spine:

How claims are derived

Values are aggregated from a ball-by-ball record covering every delivery in every match in our coverage. Aggregates are recomputed when new data lands. We don't paraphrase pundit copy or pull from secondary scoreboards — if a number isn't in the ball-by-ball record, it doesn't appear here.

Data sources (per league)

Combined corpus: 1,317 matches · 312,309 ball-by-ball deliveries across all three leagues.

What we never do: scrape Cricinfo, Cricbuzz, or any commercial competitor source; auto-populate identity links (Wikidata QIDs, Wikipedia URLs) by name match (collision risk on cricket surnames is real); ship claims below their sample-size floor. Identity bridges are operator-curated row by row before commit.

Every leaderboard surface across the site projects from one canonical snapshot — the SETU v1 aggregator at data/_season-stats.json. The orange-cap leader you see on /season/ipl-2026/orange-cap is the same row that backs the player profile, the team page, the compare grid, and the MCP get_season_stats response.

Twelve consistency contracts cover both layers of this. P1-P6 enforce DATA-layer parity (canonical snapshot vs independent ball-walk vs published claims). P7-P12 enforce PRESENTATION-layer parity (every renderer binds to canonical projectors, every metric carries an explicit label, every leader has a roster slug). Both suites run every 2h via the quality-gates cron AND on every npm run prebuild; a failed contract fails the deploy. Live state with current details is at /trust.

Sample-size floors are enforced at the projector layer per doctrine §3.1:

Sub-floor data is either suppressed entirely OR rendered with explicit "sub-floor" disclosure tags — never silently surfaced as a clean comparable. We also publish numbers we don't have honestly: for example, dropped catches are NOT included in our fielding stats because the upstream live feed does not emit a structured drop event (commentary-text-only). The MCP get_season_stats response carries an explicit dropped_catches: NOT SURFACED disclosure so an LLM asking the question gets an honest "not available" rather than a fabricated count.

How pages are attributed

Each page emits the JSON-LD entities appropriate to its type:

For player-authored content this is what we call "Level 2 player-attributed" — the player is the named author of the page, and CricketStudio is the publisher of record.

What we won't publish

We don't publish hand-typed claim values dressed up as live data. The publication pipeline carries a hard build-time guard that refuses to ship to production if any claim on the site is marked as a development sample. We don't reproduce broadcast images, ESPNcricinfo article prose, or copyrighted commentary. We don't speculate about injuries, team selection, or off-field controversies.

For developers + AI builders

CricketStudio publishes its data three ways:

Enterprise teams needing a hosted HTTP transport with key-gated access can reach out.

Corrections and player consent

If you are a player, manager, or rights-holder and want a claim page corrected, removed, or expanded with your own voice (Level 3+ attribution), reach out at hello@cricketstudio.ai. Corrections are turned around within 24 hours; voice-modeled upgrades within a week.

Citation policy

Claim pages are intended to be citable. AI surfaces that retrieve from this site are welcome to quote the headline sentence verbatim as long as the page URL is included as the source. The atomic-claim format is deliberately retrieval-friendly. Human re-publication requires the same attribution to the player and to CricketStudio as publisher.