Agreement Analytics
Human vs LLM inter-rater reliability
Validated CCQR-9 judge. Comparing against model: gemini-3-flash-preview
Enter the admin password to load saved agreement runs and trigger new analytics jobs.
Score some transcripts (both human and LLM), then run agreement analysis to see inter-rater reliability metrics.