Result
π
Congratulations to the competition winners!
π₯ 1st Place - “HFLN”
π₯ 2nd Place - “Z2A”
π₯ 3rd Place - “FNRX”
π₯ 1st Place - “HFLN”
π₯ 2nd Place - “Z2A”
π₯ 3rd Place - “FNRX”
Evaluation Leaderboard
Rank | Team Name | Total Prompt Score | Normalized Score |
---|---|---|---|
1 | HFLN | 0.0977 | 44.96 |
2 | Z2A | 0.0345 | 15.88 |
3 | FNRX | 0.0221 | 10.19 |
4 | ChatZero | 0.0161 | 7.42 |
5 | CARRASCO_VELO_Pablo | 0.0154 | 7.11 |
6 | MY_Team | 0.0121 | 5.57 |
7 | jaidh | 0.0088 | 4.04 |
- | Strong Baseline (Zero-Shot Multi-turn) | 0.0084 | 3.87 |
- | Weak Baseline (Zero-Shot) | 0.0021 | 0.96 |
Dis | MyAwesomeTeam | - | - |
Dis means disqualified.
MyAwesomeTeam is disqualified due to not using our provided package,llms4pcg.chat_with_llm
, for interacting with the model.
Model-Specific Team Rankings
Rank | Team Name | Model | Prompt Score | Normalized Prompt Score |
---|---|---|---|---|
1 | HFLN | gemma 2 | 0.0387 | 17.80 |
2 | HFLN | qwen 2.5 | 0.0386 | 17.76 |
3 | HFLN | phi 3 | 0.0204 | 9.40 |
4 | Z2A | phi 3 | 0.0168 | 7.74 |
5 | FNRX | qwen 2.5 | 0.0141 | 6.47 |
6 | Z2A | qwen 2.5 | 0.0095 | 4.39 |
7 | Z2A | gemma 2 | 0.0081 | 3.75 |
8 | CARRASCO_VELO_Pablo | qwen 2.5 | 0.0079 | 3.62 |
9 | ChatZero | qwen 2.5 | 0.0077 | 3.56 |
10 | ChatZero | gemma 2 | 0.0057 | 2.62 |
11 | FNRX | gemma 2 | 0.0056 | 2.56 |
12 | CARRASCO_VELO_Pablo | gemma 2 | 0.0054 | 2.50 |
13 | MY_Team | gemma 2 | 0.0052 | 2.39 |
14 | MY_Team | qwen 2.5 | 0.0052 | 2.38 |
15 | Zero-Shot Multi-turn | qwen 2.5 | 0.0051 | 2.36 |
16 | jaidh | gemma 2 | 0.0050 | 2.29 |
17 | jaidh | qwen 2.5 | 0.0038 | 1.74 |
18 | Zero-Shot Multi-turn | gemma 2 | 0.0029 | 1.35 |
19 | ChatZero | phi 3 | 0.0027 | 1.24 |
20 | FNRX | phi 3 | 0.0025 | 1.16 |
21 | CARRASCO_VELO_Pablo | phi 3 | 0.0021 | 0.98 |
22 | MY_Team | phi 3 | 0.0017 | 0.80 |
23 | Zero-Shot | gemma 2 | 0.0010 | 0.45 |
24 | Zero-Shot | qwen 2.5 | 0.0009 | 0.43 |
25 | Zero-Shot Multi-turn | phi 3 | 0.0003 | 0.16 |
26 | Zero-Shot | phi 3 | 0.0002 | 0.08 |
27 | jaidh | phi 3 | 0.0000 | 0.00 |
Evaluation Configuration
Parameters used during final evaluation:
- Competition package:
llms4pcg-python
version2.0.1
- Starting seed for LLM interaction (Ollama): 6280
- Random seed for programming package: 42
Download Competition Data
Competition related data can be downloaded from the following link (August 29, 2025): OSF Storage
- character_scores.csv
- constants.json - Weights and constant values.
- final_team_rankings.csv
- prompt_scores_ranks.csv
- trial_scores.csv
- zero_shot_multi_turn.zip
- zero_shot.zip
- CARRASCO_VELO_PABLO.zip
- ChatZero.zip
- FNRX.zip
- HFLN.zip
- jaidh.zip
- MyAwesomeTeam.zip
- MY_Team.zip
- Z2A.zip