Select Datasets
Search & Filter
KPI
—
Models
—
Datasets
Last updated —
Model Comparison
Click ⭐ on rows to compare (max 3)
Leaderboard
Sorted by mean score across selected datasets
Submit a Model for Evaluation
Provide your model details. Submissions are queued (status: pending) and evaluated automatically. Results will appear on the leaderboard when ready.
About this Leaderboard
This dashboard ranks LLMs on telecom-focused datasets. Each cell shows the score and the metric type used (standard or llm-as-judge). Energy & CO₂ (TODO) appear on hover.
The metric llm-as-judge adopts OpenAI OSS-120B as a judge model.