Agreement Analytics

Human vs LLM inter-rater reliability

Validated CCQR-9 judge. Comparing against model: gemini-3-flash-preview

Enter the admin password to load saved agreement runs and trigger new analytics jobs.

Score some transcripts (both human and LLM), then run agreement analysis to see inter-rater reliability metrics.