|
gemini-2.5-pro-preview-06-05 (32k think) |
83.1%
|
$49.88
|
aider --model gemini/gemini-2.5-pro-preview-06-05 --thinking-tokens 32k |
99.6% |
diff-fenced |
-
Dirname
:
2025-06-06-16-36-21--gemini0605-32k-think-diff-fenced
-
Test cases
:
225
-
Model
:
gemini-2.5-pro-preview-06-05 (32k think)
-
Edit format
:
diff-fenced
-
Commit hash
:
f827f22
-
Thinking tokens
:
32768
-
Pass rate 1
:
46.2
-
Pass rate 2
:
83.1
-
Pass num 1
:
104
-
Pass num 2
:
187
-
格式正确的百分比
:
99.6
-
Error outputs
:
1
-
Num malformed responses
:
1
-
Num with malformed responses
:
1
-
User asks
:
112
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Prompt tokens
:
2719961
-
Completion tokens
:
4648227
-
Test timeouts
:
0
-
Total tests
:
225
-
Command
:
aider --model gemini/gemini-2.5-pro-preview-06-05 --thinking-tokens 32k
-
Date
:
2025-06-06
-
Versions
:
0.84.1.dev
-
Seconds per case
:
200.3
-
Total cost
:
49.8822
|
|
o3 (high) + gpt-4.1 |
82.7%
|
$69.29
|
aider --model o3 --architect |
100.0% |
architect |
-
Dirname
:
2025-04-17-01-20-35--o3-mini-high-diff-arch
-
Test cases
:
225
-
Model
:
o3 (high) + gpt-4.1
-
Edit format
:
architect
-
Commit hash
:
80909e1-dirty
-
Editor model
:
gpt-4.1
-
Editor edit format
:
editor-diff
-
Pass rate 1
:
36.0
-
Pass rate 2
:
82.7
-
Pass num 1
:
81
-
Pass num 2
:
186
-
格式正确的百分比
:
100.0
-
Error outputs
:
9
-
Num malformed responses
:
0
-
Num with malformed responses
:
0
-
User asks
:
166
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
0
-
Total tests
:
225
-
Command
:
aider --model o3 --architect
-
Date
:
2025-04-17
-
Versions
:
0.82.2.dev
-
Seconds per case
:
110.0
-
Total cost
:
69.2921
|
|
o3 (high) |
81.3%
|
$21.23
|
aider --model o3 --reasoning-effort high |
94.7% |
diff |
-
Dirname
:
2025-06-25-21-04-24--o3-price-reduction-high
-
Test cases
:
225
-
Model
:
o3 (high)
-
Edit format
:
diff
-
Commit hash
:
c48fea6
-
Reasoning effort
:
high
-
Pass rate 1
:
40.0
-
Pass rate 2
:
81.3
-
Pass num 1
:
90
-
Pass num 2
:
183
-
格式正确的百分比
:
94.7
-
Error outputs
:
25
-
Num malformed responses
:
23
-
Num with malformed responses
:
12
-
User asks
:
116
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Prompt tokens
:
3148932
-
Completion tokens
:
2047615
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model o3 --reasoning-effort high
-
Date
:
2025-06-25
-
Versions
:
0.84.1.dev
-
Seconds per case
:
197.3
-
Total cost
:
21.2259
|
|
gemini-2.5-pro-preview-06-05 (default think) |
79.1%
|
$45.6
|
aider --model gemini/gemini-2.5-pro-preview-06-05 |
100.0% |
diff-fenced |
-
Dirname
:
2025-06-06-18-38-56--gemini0605-diff-fenced
-
Test cases
:
225
-
Model
:
gemini-2.5-pro-preview-06-05 (default think)
-
Edit format
:
diff-fenced
-
Commit hash
:
4c161f9-dirty
-
Pass rate 1
:
44.9
-
Pass rate 2
:
79.1
-
Pass num 1
:
101
-
Pass num 2
:
178
-
格式正确的百分比
:
100.0
-
Error outputs
:
4
-
Num malformed responses
:
0
-
Num with malformed responses
:
0
-
User asks
:
105
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
4
-
Prompt tokens
:
2751296
-
Completion tokens
:
4142197
-
Test timeouts
:
1
-
Total tests
:
225
-
Command
:
aider --model gemini/gemini-2.5-pro-preview-06-05
-
Date
:
2025-06-06
-
Versions
:
0.84.1.dev
-
Seconds per case
:
175.2
-
Total cost
:
45.5961
|
|
o3 |
76.9%
|
$13.75
|
aider --model o3 |
93.8% |
diff |
-
Dirname
:
2025-06-25-20-30-16--o3-price-reduction
-
Test cases
:
225
-
Model
:
o3
-
Edit format
:
diff
-
Commit hash
:
c48fea6
-
Pass rate 1
:
40.9
-
Pass rate 2
:
76.9
-
Pass num 1
:
92
-
Pass num 2
:
173
-
格式正确的百分比
:
93.8
-
Error outputs
:
22
-
Num malformed responses
:
22
-
Num with malformed responses
:
14
-
User asks
:
108
-
Lazy comments
:
2
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Prompt tokens
:
2893189
-
Completion tokens
:
1154767
-
Test timeouts
:
1
-
Total tests
:
225
-
Command
:
aider --model o3
-
Date
:
2025-06-25
-
Versions
:
0.84.1.dev
-
Seconds per case
:
101.7
-
Total cost
:
13.7517
|
|
Gemini 2.5 Pro Preview 05-06 |
76.9%
|
$37.41
|
aider --model gemini/gemini-2.5-pro-preview-05-06 |
97.3% |
diff-fenced |
-
Dirname
:
2025-05-07-19-32-40--gemini0506-diff-fenced-completion_cost
-
Test cases
:
225
-
Model
:
Gemini 2.5 Pro Preview 05-06
-
Edit format
:
diff-fenced
-
Commit hash
:
3b08327-dirty
-
Pass rate 1
:
36.4
-
Pass rate 2
:
76.9
-
Pass num 1
:
82
-
Pass num 2
:
173
-
格式正确的百分比
:
97.3
-
Error outputs
:
15
-
Num malformed responses
:
7
-
Num with malformed responses
:
6
-
User asks
:
105
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model gemini/gemini-2.5-pro-preview-05-06
-
Date
:
2025-05-07
-
Versions
:
0.82.4.dev
-
Seconds per case
:
165.3
-
Total cost
:
37.4104
|
|
Gemini 2.5 Pro Preview 03-25 |
72.9%
|
|
aider --model gemini/gemini-2.5-pro-preview-03-25 |
92.4% |
diff-fenced |
-
Dirname
:
2025-04-12-04-55-50--gemini-25-pro-diff-fenced
-
Test cases
:
225
-
Model
:
Gemini 2.5 Pro Preview 03-25
-
Edit format
:
diff-fenced
-
Commit hash
:
0282574
-
Pass rate 1
:
40.9
-
Pass rate 2
:
72.9
-
Pass num 1
:
92
-
Pass num 2
:
164
-
格式正确的百分比
:
92.4
-
Error outputs
:
21
-
Num malformed responses
:
21
-
Num with malformed responses
:
17
-
User asks
:
69
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model gemini/gemini-2.5-pro-preview-03-25
-
Date
:
2025-04-12
-
Versions
:
0.81.3.dev
-
Seconds per case
:
45.3
-
Total cost
:
0
|
|
claude-opus-4-20250514 (32k thinking) |
72.0%
|
$65.75
|
aider --model claude-opus-4-20250514 |
97.3% |
diff |
-
Dirname
:
2025-05-25-20-40-51--opus4-diff-exuser
-
Test cases
:
225
-
Model
:
claude-opus-4-20250514 (32k thinking)
-
Edit format
:
diff
-
Commit hash
:
9ef3211
-
Thinking tokens
:
32000
-
Pass rate 1
:
37.3
-
Pass rate 2
:
72.0
-
Pass num 1
:
84
-
Pass num 2
:
162
-
格式正确的百分比
:
97.3
-
Error outputs
:
10
-
Num malformed responses
:
6
-
Num with malformed responses
:
6
-
User asks
:
97
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Prompt tokens
:
2567514
-
Completion tokens
:
363142
-
Test timeouts
:
4
-
Total tests
:
225
-
Command
:
aider --model claude-opus-4-20250514
-
Date
:
2025-05-25
-
Versions
:
0.83.3.dev
-
Seconds per case
:
44.1
-
Total cost
:
65.7484
|
|
o4-mini (high) |
72.0%
|
$19.64
|
aider --model o4-mini |
90.7% |
diff |
-
Dirname
:
2025-04-16-22-01-58--o4-mini-high-diff-exsys
-
Test cases
:
225
-
Model
:
o4-mini (high)
-
Edit format
:
diff
-
Commit hash
:
b66901f-dirty
-
Pass rate 1
:
19.6
-
Pass rate 2
:
72.0
-
Pass num 1
:
44
-
Pass num 2
:
162
-
格式正确的百分比
:
90.7
-
Error outputs
:
26
-
Num malformed responses
:
24
-
Num with malformed responses
:
21
-
User asks
:
66
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model o4-mini
-
Date
:
2025-04-16
-
Versions
:
0.82.1.dev
-
Seconds per case
:
176.5
-
Total cost
:
19.6399
|
|
DeepSeek R1 (0528) |
71.4%
|
$4.8
|
aider --model deepseek/deepseek-reasoner |
94.6% |
diff |
-
Dirname
:
2025-06-06-16-47-07--r1-diff
-
Test cases
:
224
-
Model
:
DeepSeek R1 (0528)
-
Edit format
:
diff
-
Commit hash
:
4c161f9-dirty
-
Pass rate 1
:
34.4
-
Pass rate 2
:
71.4
-
Pass num 1
:
77
-
Pass num 2
:
160
-
格式正确的百分比
:
94.6
-
Error outputs
:
28
-
Num malformed responses
:
15
-
Num with malformed responses
:
12
-
User asks
:
105
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Prompt tokens
:
2644169
-
Completion tokens
:
1842168
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model deepseek/deepseek-reasoner
-
Date
:
2025-06-06
-
Versions
:
0.84.1.dev
-
Seconds per case
:
716.6
-
Total cost
:
4.8016
|
|
claude-opus-4-20250514 (no think) |
70.7%
|
$68.63
|
aider --model claude-opus-4-20250514 |
98.7% |
diff |
-
Dirname
:
2025-05-25-19-57-20--opus4-diff-exuser
-
Test cases
:
225
-
Model
:
claude-opus-4-20250514 (no think)
-
Edit format
:
diff
-
Commit hash
:
9ef3211
-
Pass rate 1
:
32.9
-
Pass rate 2
:
70.7
-
Pass num 1
:
74
-
Pass num 2
:
159
-
格式正确的百分比
:
98.7
-
Error outputs
:
3
-
Num malformed responses
:
3
-
Num with malformed responses
:
3
-
User asks
:
105
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Prompt tokens
:
2671437
-
Completion tokens
:
380717
-
Test timeouts
:
3
-
Total tests
:
225
-
Command
:
aider --model claude-opus-4-20250514
-
Date
:
2025-05-25
-
Versions
:
0.83.3.dev
-
Seconds per case
:
42.5
-
Total cost
:
68.6253
|
|
claude-3-7-sonnet-20250219 (32k thinking tokens) |
64.9%
|
$36.83
|
aider --model anthropic/claude-3-7-sonnet-20250219 --thinking-tokens 32k |
97.8% |
diff |
-
Dirname
:
2025-02-24-21-47-23--sonnet37-diff-think-32k-64k
-
Test cases
:
225
-
Model
:
claude-3-7-sonnet-20250219 (32k thinking tokens)
-
Edit format
:
diff
-
Commit hash
:
60d11a6, 93edbda
-
Pass rate 1
:
29.3
-
Pass rate 2
:
64.9
-
Pass num 1
:
66
-
Pass num 2
:
146
-
格式正确的百分比
:
97.8
-
Error outputs
:
66
-
Num malformed responses
:
5
-
Num with malformed responses
:
5
-
User asks
:
5
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
1
-
Total tests
:
225
-
Command
:
aider --model anthropic/claude-3-7-sonnet-20250219 --thinking-tokens 32k
-
Date
:
2025-02-24
-
Versions
:
0.75.1.dev
-
Seconds per case
:
105.2
-
Total cost
:
36.8343
|
|
DeepSeek R1 + claude-3-5-sonnet-20241022 |
64.0%
|
$13.29
|
aider --architect --model r1 --editor-model sonnet |
100.0% |
architect |
-
Dirname
:
2025-01-23-19-14-48--r1-architect-sonnet
-
Test cases
:
225
-
Model
:
DeepSeek R1 + claude-3-5-sonnet-20241022
-
Edit format
:
architect
-
Commit hash
:
05a77c7
-
Editor model
:
claude-3-5-sonnet-20241022
-
Editor edit format
:
editor-diff
-
Pass rate 1
:
27.1
-
Pass rate 2
:
64.0
-
Pass num 1
:
61
-
Pass num 2
:
144
-
格式正确的百分比
:
100.0
-
Error outputs
:
2
-
Num malformed responses
:
0
-
Num with malformed responses
:
0
-
User asks
:
392
-
Lazy comments
:
6
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
5
-
Total tests
:
225
-
Command
:
aider --architect --model r1 --editor-model sonnet
-
Date
:
2025-01-23
-
Versions
:
0.72.3.dev
-
Seconds per case
:
251.6
-
Total cost
:
13.2933
|
|
o1-2024-12-17 (high) |
61.7%
|
$186.5
|
aider --model openrouter/openai/o1 |
91.5% |
diff |
-
Dirname
:
2024-12-21-19-23-03--polyglot-o1-hard-diff
-
Test cases
:
224
-
Model
:
o1-2024-12-17 (high)
-
Edit format
:
diff
-
Commit hash
:
a755079-dirty
-
Pass rate 1
:
23.7
-
Pass rate 2
:
61.7
-
Pass num 1
:
53
-
Pass num 2
:
139
-
格式正确的百分比
:
91.5
-
Error outputs
:
25
-
Num malformed responses
:
24
-
Num with malformed responses
:
19
-
User asks
:
16
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model openrouter/openai/o1
-
Date
:
2024-12-21
-
Versions
:
0.69.2.dev
-
Seconds per case
:
133.2
-
Total cost
:
186.4958
|
|
claude-sonnet-4-20250514 (32k thinking) |
61.3%
|
$26.58
|
aider --model claude-sonnet-4-20250514 |
97.3% |
diff |
-
Dirname
:
2025-05-24-22-10-36--sonnet4-diff-exuser-think32k
-
Test cases
:
225
-
Model
:
claude-sonnet-4-20250514 (32k thinking)
-
Edit format
:
diff
-
Commit hash
:
e3cb907
-
Thinking tokens
:
32000
-
Pass rate 1
:
25.8
-
Pass rate 2
:
61.3
-
Pass num 1
:
58
-
Pass num 2
:
138
-
格式正确的百分比
:
97.3
-
Error outputs
:
10
-
Num malformed responses
:
10
-
Num with malformed responses
:
6
-
User asks
:
111
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Prompt tokens
:
2863068
-
Completion tokens
:
1271074
-
Test timeouts
:
6
-
Total tests
:
225
-
Command
:
aider --model claude-sonnet-4-20250514
-
Date
:
2025-05-24
-
Versions
:
0.83.3.dev
-
Seconds per case
:
79.9
-
Total cost
:
26.5755
|
|
claude-3-7-sonnet-20250219 (no thinking) |
60.4%
|
$17.72
|
aider --model sonnet |
93.3% |
diff |
-
Dirname
:
2025-02-24-19-54-07--sonnet37-diff
-
Test cases
:
225
-
Model
:
claude-3-7-sonnet-20250219 (no thinking)
-
Edit format
:
diff
-
Commit hash
:
75e9ee6
-
Pass rate 1
:
24.4
-
Pass rate 2
:
60.4
-
Pass num 1
:
55
-
Pass num 2
:
136
-
格式正确的百分比
:
93.3
-
Error outputs
:
16
-
Num malformed responses
:
16
-
Num with malformed responses
:
15
-
User asks
:
12
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
0
-
Total tests
:
225
-
Command
:
aider --model sonnet
-
Date
:
2025-02-24
-
Versions
:
0.74.4.dev
-
Seconds per case
:
28.3
-
Total cost
:
17.7191
|
|
o3-mini (high) |
60.4%
|
$18.16
|
aider --model o3-mini --reasoning-effort high |
93.3% |
diff |
-
Dirname
:
2025-01-31-20-42-47--o3-mini-diff-high
-
Test cases
:
224
-
Model
:
o3-mini (high)
-
Edit format
:
diff
-
Commit hash
:
b0d58d1-dirty
-
Pass rate 1
:
21.0
-
Pass rate 2
:
60.4
-
Pass num 1
:
47
-
Pass num 2
:
136
-
格式正确的百分比
:
93.3
-
Error outputs
:
26
-
Num malformed responses
:
24
-
Num with malformed responses
:
15
-
User asks
:
19
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
7
-
Total tests
:
225
-
Command
:
aider --model o3-mini --reasoning-effort high
-
Date
:
2025-01-31
-
Versions
:
0.72.4.dev
-
Seconds per case
:
124.6
-
Total cost
:
18.1584
|
|
Qwen3 235B A22B diff, no think, Alibaba API |
59.6%
|
|
aider --model openai/qwen3-235b-a22b |
92.9% |
diff |
-
Dirname
:
2025-05-09-17-02-02--qwen3-235b-a22b.unthink_16k_diff
-
Test cases
:
225
-
Model
:
Qwen3 235B A22B diff, no think, Alibaba API
-
Edit format
:
diff
-
Commit hash
:
91d7fbd-dirty
-
Pass rate 1
:
28.9
-
Pass rate 2
:
59.6
-
Pass num 1
:
65
-
Pass num 2
:
134
-
格式正确的百分比
:
92.9
-
Error outputs
:
22
-
Num malformed responses
:
22
-
Num with malformed responses
:
16
-
User asks
:
111
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Prompt tokens
:
2816192
-
Completion tokens
:
342062
-
Test timeouts
:
1
-
Total tests
:
225
-
Command
:
aider --model openai/qwen3-235b-a22b
-
Date
:
2025-05-09
-
Versions
:
0.82.4.dev
-
Seconds per case
:
45.4
-
Total cost
:
0.0
|
|
DeepSeek R1 |
56.9%
|
$5.42
|
aider --model deepseek/deepseek-reasoner |
96.9% |
diff |
-
Dirname
:
2025-01-20-19-11-38--ds-turns-upd-cur-msgs-fix-with-summarizer
-
Test cases
:
225
-
Model
:
DeepSeek R1
-
Edit format
:
diff
-
Commit hash
:
5650697-dirty
-
Pass rate 1
:
26.7
-
Pass rate 2
:
56.9
-
Pass num 1
:
60
-
Pass num 2
:
128
-
格式正确的百分比
:
96.9
-
Error outputs
:
8
-
Num malformed responses
:
7
-
Num with malformed responses
:
7
-
User asks
:
15
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
5
-
Total tests
:
225
-
Command
:
aider --model deepseek/deepseek-reasoner
-
Date
:
2025-01-20
-
Versions
:
0.71.2.dev
-
Seconds per case
:
113.7
-
Total cost
:
5.4193
|
|
claude-sonnet-4-20250514 (no thinking) |
56.4%
|
$15.82
|
aider --model claude-sonnet-4-20250514 |
98.2% |
diff |
-
Dirname
:
2025-05-24-21-17-54--sonnet4-diff-exuser
-
Test cases
:
225
-
Model
:
claude-sonnet-4-20250514 (no thinking)
-
Edit format
:
diff
-
Commit hash
:
ef3f8bb-dirty
-
Pass rate 1
:
20.4
-
Pass rate 2
:
56.4
-
Pass num 1
:
46
-
Pass num 2
:
127
-
格式正确的百分比
:
98.2
-
Error outputs
:
6
-
Num malformed responses
:
4
-
Num with malformed responses
:
4
-
User asks
:
129
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Prompt tokens
:
3460663
-
Completion tokens
:
433373
-
Test timeouts
:
7
-
Total tests
:
225
-
Command
:
aider --model claude-sonnet-4-20250514
-
Date
:
2025-05-24
-
Versions
:
0.83.3.dev
-
Seconds per case
:
29.8
-
Total cost
:
15.8155
|
|
gemini-2.5-flash-preview-05-20 (24k think) |
55.1%
|
$8.56
|
aider --model gemini/gemini-2.5-flash-preview-05-20 |
95.6% |
diff |
-
Dirname
:
2025-05-25-22-58-44--flash25-05-20-24k-think
-
Test cases
:
225
-
Model
:
gemini-2.5-flash-preview-05-20 (24k think)
-
Edit format
:
diff
-
Commit hash
:
a8568c3-dirty
-
Thinking tokens
:
24576
-
Pass rate 1
:
26.2
-
Pass rate 2
:
55.1
-
Pass num 1
:
59
-
Pass num 2
:
124
-
格式正确的百分比
:
95.6
-
Error outputs
:
15
-
Num malformed responses
:
15
-
Num with malformed responses
:
10
-
User asks
:
101
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Prompt tokens
:
3666792
-
Completion tokens
:
2703162
-
Test timeouts
:
4
-
Total tests
:
225
-
Command
:
aider --model gemini/gemini-2.5-flash-preview-05-20
-
Date
:
2025-05-25
-
Versions
:
0.83.3.dev
-
Seconds per case
:
53.9
-
Total cost
:
8.5625
|
|
DeepSeek V3 (0324) |
55.1%
|
$1.12
|
aider --model deepseek/deepseek-chat |
99.6% |
diff |
-
Dirname
:
2025-03-24-15-41-33--deepseek-v3-0324-polyglot-diff
-
Test cases
:
225
-
Model
:
DeepSeek V3 (0324)
-
Edit format
:
diff
-
Commit hash
:
502b863
-
Pass rate 1
:
28.0
-
Pass rate 2
:
55.1
-
Pass num 1
:
63
-
Pass num 2
:
124
-
格式正确的百分比
:
99.6
-
Error outputs
:
32
-
Num malformed responses
:
1
-
Num with malformed responses
:
1
-
User asks
:
96
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
2
-
Test timeouts
:
4
-
Total tests
:
225
-
Command
:
aider --model deepseek/deepseek-chat
-
Date
:
2025-03-24
-
Versions
:
0.78.1.dev
-
Seconds per case
:
290.0
-
Total cost
:
1.1164
|
|
Quasar Alpha |
54.7%
|
|
aider --model openrouter/openrouter/quasar-alpha |
98.2% |
diff |
-
Dirname
:
2025-04-04-02-57-25--qalpha-diff-exsys
-
Test cases
:
225
-
Model
:
Quasar Alpha
-
Edit format
:
diff
-
Commit hash
:
8a34a6c-dirty
-
Pass rate 1
:
21.8
-
Pass rate 2
:
54.7
-
Pass num 1
:
49
-
Pass num 2
:
123
-
格式正确的百分比
:
98.2
-
Error outputs
:
4
-
Num malformed responses
:
4
-
Num with malformed responses
:
4
-
User asks
:
187
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
4
-
Total tests
:
225
-
Command
:
aider --model openrouter/openrouter/quasar-alpha
-
Date
:
2025-04-04
-
Versions
:
0.80.5.dev
-
Seconds per case
:
14.8
-
Total cost
:
0.0
|
|
o3-mini (medium) |
53.8%
|
$8.86
|
aider --model o3-mini |
95.1% |
diff |
-
Dirname
:
2025-01-31-20-27-46--o3-mini-diff2
-
Test cases
:
225
-
Model
:
o3-mini (medium)
-
Edit format
:
diff
-
Commit hash
:
2fb517b-dirty
-
Pass rate 1
:
19.1
-
Pass rate 2
:
53.8
-
Pass num 1
:
43
-
Pass num 2
:
121
-
格式正确的百分比
:
95.1
-
Error outputs
:
28
-
Num malformed responses
:
28
-
Num with malformed responses
:
11
-
User asks
:
17
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model o3-mini
-
Date
:
2025-01-31
-
Versions
:
0.72.4.dev
-
Seconds per case
:
47.2
-
Total cost
:
8.8599
|
|
Grok 3 Beta |
53.3%
|
$11.03
|
aider --model openrouter/x-ai/grok-3-beta |
99.6% |
diff |
-
Dirname
:
2025-04-10-04-21-31--grok3-diff-exuser
-
Test cases
:
225
-
Model
:
Grok 3 Beta
-
Edit format
:
diff
-
Commit hash
:
2dd40fc-dirty
-
Pass rate 1
:
22.2
-
Pass rate 2
:
53.3
-
Pass num 1
:
50
-
Pass num 2
:
120
-
格式正确的百分比
:
99.6
-
Error outputs
:
1
-
Num malformed responses
:
1
-
Num with malformed responses
:
1
-
User asks
:
68
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model openrouter/x-ai/grok-3-beta
-
Date
:
2025-04-10
-
Versions
:
0.81.2.dev
-
Seconds per case
:
15.3
-
Total cost
:
11.0338
|
|
Optimus Alpha |
52.9%
|
|
aider --model openrouter/openrouter/optimus-alpha |
97.3% |
diff |
-
Dirname
:
2025-04-10-19-02-44--oalpha-diff-exsys
-
Test cases
:
225
-
Model
:
Optimus Alpha
-
Edit format
:
diff
-
Commit hash
:
532bc45-dirty
-
Pass rate 1
:
21.3
-
Pass rate 2
:
52.9
-
Pass num 1
:
48
-
Pass num 2
:
119
-
格式正确的百分比
:
97.3
-
Error outputs
:
7
-
Num malformed responses
:
6
-
Num with malformed responses
:
6
-
User asks
:
182
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
3
-
Total tests
:
225
-
Command
:
aider --model openrouter/openrouter/optimus-alpha
-
Date
:
2025-04-10
-
Versions
:
0.81.2.dev
-
Seconds per case
:
18.4
-
Total cost
:
0.0
|
|
gpt-4.1 |
52.4%
|
$9.86
|
aider --model gpt-4.1 |
98.2% |
diff |
-
Dirname
:
2025-04-14-21-05-54--gpt41-diff-exuser
-
Test cases
:
225
-
Model
:
gpt-4.1
-
Edit format
:
diff
-
Commit hash
:
7a87db5-dirty
-
Pass rate 1
:
20.0
-
Pass rate 2
:
52.4
-
Pass num 1
:
45
-
Pass num 2
:
118
-
格式正确的百分比
:
98.2
-
Error outputs
:
6
-
Num malformed responses
:
5
-
Num with malformed responses
:
4
-
User asks
:
171
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
5
-
Total tests
:
225
-
Command
:
aider --model gpt-4.1
-
Date
:
2025-04-14
-
Versions
:
0.81.4.dev
-
Seconds per case
:
20.5
-
Total cost
:
9.8556
|
|
claude-3-5-sonnet-20241022 |
51.6%
|
$14.41
|
aider --model claude-3-5-sonnet-20241022 |
99.6% |
diff |
-
Dirname
:
2025-01-17-19-44-33--sonnet-baseline-jan-17
-
Test cases
:
225
-
Model
:
claude-3-5-sonnet-20241022
-
Edit format
:
diff
-
Commit hash
:
6451d59
-
Pass rate 1
:
22.2
-
Pass rate 2
:
51.6
-
Pass num 1
:
50
-
Pass num 2
:
116
-
格式正确的百分比
:
99.6
-
Error outputs
:
2
-
Num malformed responses
:
1
-
Num with malformed responses
:
1
-
User asks
:
11
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
8
-
Total tests
:
225
-
Command
:
aider --model claude-3-5-sonnet-20241022
-
Date
:
2025-01-17
-
Versions
:
0.71.2.dev
-
Seconds per case
:
21.4
-
Total cost
:
14.4063
|
|
Grok 3 Mini Beta (high) |
49.3%
|
$0.73
|
aider --model xai/grok-3-mini-beta --reasoning-effort high |
99.6% |
whole |
-
Dirname
:
2025-04-10-23-59-02--xai-grok3-mini-whole-high
-
Test cases
:
225
-
Model
:
Grok 3 Mini Beta (high)
-
Edit format
:
whole
-
Commit hash
:
8ee33da-dirty
-
Pass rate 1
:
17.3
-
Pass rate 2
:
49.3
-
Pass num 1
:
39
-
Pass num 2
:
111
-
格式正确的百分比
:
99.6
-
Error outputs
:
1
-
Num malformed responses
:
1
-
Num with malformed responses
:
1
-
User asks
:
64
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
0
-
Total tests
:
225
-
Command
:
aider --model xai/grok-3-mini-beta --reasoning-effort high
-
Date
:
2025-04-10
-
Versions
:
0.81.3.dev
-
Seconds per case
:
79.1
-
Total cost
:
0.7346
|
|
DeepSeek Chat V3 (prev) |
48.4%
|
$0.34
|
aider --model deepseek/deepseek-chat |
98.7% |
diff |
-
Dirname
:
2024-12-25-13-31-51--deepseekv3preview-diff2
-
Test cases
:
225
-
Model
:
DeepSeek Chat V3 (prev)
-
Edit format
:
diff
-
Commit hash
:
0a23c4a-dirty
-
Pass rate 1
:
22.7
-
Pass rate 2
:
48.4
-
Pass num 1
:
51
-
Pass num 2
:
109
-
格式正确的百分比
:
98.7
-
Error outputs
:
7
-
Num malformed responses
:
7
-
Num with malformed responses
:
3
-
User asks
:
19
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
8
-
Total tests
:
225
-
Command
:
aider --model deepseek/deepseek-chat
-
Date
:
2024-12-25
-
Versions
:
0.69.2.dev
-
Seconds per case
:
34.8
-
Total cost
:
0.3369
|
|
gemini-2.5-flash-preview-04-17 (default) |
47.1%
|
$1.85
|
aider --model gemini/gemini-2.5-flash-preview-04-17 |
85.3% |
diff |
-
Dirname
:
2025-04-20-19-54-31--flash25-diff-no-think
-
Test cases
:
225
-
Model
:
gemini-2.5-flash-preview-04-17 (default)
-
Edit format
:
diff
-
Commit hash
:
7fcce5d-dirty
-
Pass rate 1
:
21.8
-
Pass rate 2
:
47.1
-
Pass num 1
:
49
-
Pass num 2
:
106
-
格式正确的百分比
:
85.3
-
Error outputs
:
60
-
Num malformed responses
:
55
-
Num with malformed responses
:
33
-
User asks
:
82
-
Lazy comments
:
1
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
5
-
Test timeouts
:
4
-
Total tests
:
225
-
Command
:
aider --model gemini/gemini-2.5-flash-preview-04-17
-
Date
:
2025-04-20
-
Versions
:
0.82.3.dev
-
Seconds per case
:
50.1
-
Total cost
:
1.8451
|
|
chatgpt-4o-latest (2025-03-29) |
45.3%
|
$19.74
|
aider --model chatgpt-4o-latest |
64.4% |
diff |
-
Dirname
:
2025-03-29-05-24-55--chatgpt4o-mar28-diff
-
Test cases
:
225
-
Model
:
chatgpt-4o-latest (2025-03-29)
-
Edit format
:
diff
-
Commit hash
:
0decbad
-
Pass rate 1
:
16.4
-
Pass rate 2
:
45.3
-
Pass num 1
:
37
-
Pass num 2
:
102
-
格式正确的百分比
:
64.4
-
Error outputs
:
85
-
Num malformed responses
:
85
-
Num with malformed responses
:
80
-
User asks
:
174
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
4
-
Total tests
:
225
-
Command
:
aider --model chatgpt-4o-latest
-
Date
:
2025-03-29
-
Versions
:
0.79.3.dev
-
Seconds per case
:
10.3
-
Total cost
:
19.7416
|
|
gpt-4.5-preview |
44.9%
|
$183.18
|
aider --model openai/gpt-4.5-preview |
97.3% |
diff |
-
Dirname
:
2025-02-27-20-26-15--gpt45-diff3
-
Test cases
:
224
-
Model
:
gpt-4.5-preview
-
Edit format
:
diff
-
Commit hash
:
b462e55-dirty
-
Pass rate 1
:
22.3
-
Pass rate 2
:
44.9
-
Pass num 1
:
50
-
Pass num 2
:
101
-
格式正确的百分比
:
97.3
-
Error outputs
:
10
-
Num malformed responses
:
8
-
Num with malformed responses
:
6
-
User asks
:
15
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model openai/gpt-4.5-preview
-
Date
:
2025-02-27
-
Versions
:
0.75.2.dev
-
Seconds per case
:
113.5
-
Total cost
:
183.1802
|
|
gemini-2.5-flash-preview-05-20 (no think) |
44.0%
|
$1.14
|
aider --model gemini/gemini-2.5-flash-preview-05-20 |
93.8% |
diff |
-
Dirname
:
2025-05-26-15-56-31--flash25-05-20-24k-think
-
Test cases
:
225
-
Model
:
gemini-2.5-flash-preview-05-20 (no think)
-
Edit format
:
diff
-
Commit hash
:
214b811-dirty
-
Thinking tokens
:
0
-
Pass rate 1
:
20.9
-
Pass rate 2
:
44.0
-
Pass num 1
:
47
-
Pass num 2
:
99
-
格式正确的百分比
:
93.8
-
Error outputs
:
16
-
Num malformed responses
:
16
-
Num with malformed responses
:
14
-
User asks
:
79
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Prompt tokens
:
5512458
-
Completion tokens
:
514145
-
Test timeouts
:
4
-
Total tests
:
225
-
Command
:
aider --model gemini/gemini-2.5-flash-preview-05-20
-
Date
:
2025-05-26
-
Versions
:
0.83.3.dev
-
Seconds per case
:
12.2
-
Total cost
:
1.1354
|
|
Qwen3 32B |
40.0%
|
$0.76
|
aider --model openrouter/qwen/qwen3-32b |
83.6% |
diff |
-
Dirname
:
2025-05-08-03-20-24--qwen3-32b-default
-
Test cases
:
225
-
Model
:
Qwen3 32B
-
Edit format
:
diff
-
Commit hash
:
aaacee5-dirty, aeaf259
-
Pass rate 1
:
14.2
-
Pass rate 2
:
40.0
-
Pass num 1
:
32
-
Pass num 2
:
90
-
格式正确的百分比
:
83.6
-
Error outputs
:
119
-
Num malformed responses
:
50
-
Num with malformed responses
:
37
-
User asks
:
97
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
12
-
Prompt tokens
:
317591
-
Completion tokens
:
120418
-
Test timeouts
:
5
-
Total tests
:
225
-
Command
:
aider --model openrouter/qwen/qwen3-32b
-
Date
:
2025-05-08
-
Versions
:
0.82.4.dev
-
Seconds per case
:
372.2
-
Total cost
:
0.7603
|
|
gemini-exp-1206 |
38.2%
|
|
aider --model gemini/gemini-exp-1206 |
98.2% |
whole |
-
Dirname
:
2024-12-22-18-43-25--gemini-exp-1206-polyglot-whole-2
-
Test cases
:
225
-
Model
:
gemini-exp-1206
-
Edit format
:
whole
-
Commit hash
:
b1bc2f8
-
Pass rate 1
:
19.6
-
Pass rate 2
:
38.2
-
Pass num 1
:
44
-
Pass num 2
:
86
-
格式正确的百分比
:
98.2
-
Error outputs
:
8
-
Num malformed responses
:
8
-
Num with malformed responses
:
4
-
User asks
:
32
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
9
-
Total tests
:
225
-
Command
:
aider --model gemini/gemini-exp-1206
-
Date
:
2024-12-22
-
Versions
:
0.69.2.dev
-
Seconds per case
:
45.5
-
Total cost
:
0.0
|
|
Gemini 2.0 Pro exp-02-05 |
35.6%
|
|
aider --model gemini/gemini-2.0-pro-exp-02-05 |
100.0% |
whole |
-
Dirname
:
2025-02-25-20-23-07--gemini-pro
-
Test cases
:
225
-
Model
:
Gemini 2.0 Pro exp-02-05
-
Edit format
:
whole
-
Commit hash
:
2fccd47
-
Pass rate 1
:
20.4
-
Pass rate 2
:
35.6
-
Pass num 1
:
46
-
Pass num 2
:
80
-
格式正确的百分比
:
100.0
-
Error outputs
:
430
-
Num malformed responses
:
0
-
Num with malformed responses
:
0
-
User asks
:
13
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
5
-
Total tests
:
225
-
Command
:
aider --model gemini/gemini-2.0-pro-exp-02-05
-
Date
:
2025-02-25
-
Versions
:
0.75.2.dev
-
Seconds per case
:
34.8
-
Total cost
:
0.0
|
|
Grok 3 Mini Beta (low) |
34.7%
|
$0.79
|
aider --model openrouter/x-ai/grok-3-mini-beta |
100.0% |
whole |
-
Dirname
:
2025-04-10-18-47-24--grok3-mini-whole-exuser
-
Test cases
:
225
-
Model
:
Grok 3 Mini Beta (low)
-
Edit format
:
whole
-
Commit hash
:
14ffe77-dirty
-
Pass rate 1
:
11.1
-
Pass rate 2
:
34.7
-
Pass num 1
:
25
-
Pass num 2
:
78
-
格式正确的百分比
:
100.0
-
Error outputs
:
3
-
Num malformed responses
:
0
-
Num with malformed responses
:
0
-
User asks
:
73
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
5
-
Total tests
:
225
-
Command
:
aider --model openrouter/x-ai/grok-3-mini-beta
-
Date
:
2025-04-10
-
Versions
:
0.81.2.dev
-
Seconds per case
:
35.1
-
Total cost
:
0.7856
|
|
o1-mini-2024-09-12 |
32.9%
|
$18.58
|
aider --model o1-mini |
96.9% |
whole |
-
Dirname
:
2024-12-22-21-26-35--polyglot-o1mini-whole
-
Test cases
:
225
-
Model
:
o1-mini-2024-09-12
-
Edit format
:
whole
-
Commit hash
:
37df899
-
Pass rate 1
:
5.8
-
Pass rate 2
:
32.9
-
Pass num 1
:
13
-
Pass num 2
:
74
-
格式正确的百分比
:
96.9
-
Error outputs
:
8
-
Num malformed responses
:
8
-
Num with malformed responses
:
7
-
User asks
:
27
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
3
-
Total tests
:
225
-
Command
:
aider --model o1-mini
-
Date
:
2024-12-22
-
Versions
:
0.69.2.dev
-
Seconds per case
:
34.7
-
Total cost
:
18.577
|
|
gpt-4.1-mini |
32.4%
|
$1.99
|
aider --model gpt-4.1-mini |
92.4% |
diff |
-
Dirname
:
2025-04-14-21-27-53--gpt41mini-diff
-
Test cases
:
225
-
Model
:
gpt-4.1-mini
-
Edit format
:
diff
-
Commit hash
:
ffb743e-dirty
-
Pass rate 1
:
11.1
-
Pass rate 2
:
32.4
-
Pass num 1
:
25
-
Pass num 2
:
73
-
格式正确的百分比
:
92.4
-
Error outputs
:
64
-
Num malformed responses
:
62
-
Num with malformed responses
:
17
-
User asks
:
159
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
2
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model gpt-4.1-mini
-
Date
:
2025-04-14
-
Versions
:
0.81.4.dev
-
Seconds per case
:
19.5
-
Total cost
:
1.9918
|
|
claude-3-5-haiku-20241022 |
28.0%
|
$6.06
|
aider --model claude-3-5-haiku-20241022 |
91.1% |
diff |
-
Dirname
:
2024-12-21-21-46-27--polyglot-haiku-diff
-
Test cases
:
225
-
Model
:
claude-3-5-haiku-20241022
-
Edit format
:
diff
-
Commit hash
:
a755079-dirty
-
Pass rate 1
:
7.1
-
Pass rate 2
:
28.0
-
Pass num 1
:
16
-
Pass num 2
:
63
-
格式正确的百分比
:
91.1
-
Error outputs
:
31
-
Num malformed responses
:
30
-
Num with malformed responses
:
20
-
User asks
:
13
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
9
-
Total tests
:
225
-
Command
:
aider --model claude-3-5-haiku-20241022
-
Date
:
2024-12-21
-
Versions
:
0.69.2.dev
-
Seconds per case
:
31.8
-
Total cost
:
6.0583
|
|
chatgpt-4o-latest (2025-02-15) |
27.1%
|
$14.37
|
aider --model chatgpt-4o-latest |
93.3% |
diff |
-
Dirname
:
2025-02-15-19-51-22--chatgpt4o-feb15-diff
-
Test cases
:
223
-
Model
:
chatgpt-4o-latest (2025-02-15)
-
Edit format
:
diff
-
Commit hash
:
108ce18-dirty
-
Pass rate 1
:
9.0
-
Pass rate 2
:
27.1
-
Pass num 1
:
20
-
Pass num 2
:
61
-
格式正确的百分比
:
93.3
-
Error outputs
:
66
-
Num malformed responses
:
21
-
Num with malformed responses
:
15
-
User asks
:
57
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model chatgpt-4o-latest
-
Date
:
2025-02-15
-
Versions
:
0.74.3.dev
-
Seconds per case
:
12.4
-
Total cost
:
14.3703
|
|
QwQ-32B + Qwen 2.5 Coder Instruct |
26.2%
|
|
aider --model fireworks_ai/accounts/fireworks/models/qwq-32b --architect |
100.0% |
architect |
-
Dirname
:
2025-03-07-15-11-27--qwq32b-arch-temp-topp-again
-
Test cases
:
225
-
Model
:
QwQ-32B + Qwen 2.5 Coder Instruct
-
Edit format
:
architect
-
Commit hash
:
52162a5
-
Editor model
:
fireworks_ai/accounts/fireworks/models/qwen2p5-coder-32b-instruct
-
Editor edit format
:
editor-diff
-
Pass rate 1
:
9.8
-
Pass rate 2
:
26.2
-
Pass num 1
:
22
-
Pass num 2
:
59
-
格式正确的百分比
:
100.0
-
Error outputs
:
122
-
Num malformed responses
:
0
-
Num with malformed responses
:
0
-
User asks
:
489
-
Lazy comments
:
8
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model fireworks_ai/accounts/fireworks/models/qwq-32b --architect
-
Date
:
2025-03-07
-
Versions
:
0.75.3.dev
-
Seconds per case
:
137.4
-
Total cost
:
0
|
|
gpt-4o-2024-08-06 |
23.1%
|
$7.03
|
aider --model gpt-4o-2024-08-06 |
94.2% |
diff |
-
Dirname
:
2024-12-30-20-44-54--gpt4o-ex-as-sys-clean-prompt
-
Test cases
:
225
-
Model
:
gpt-4o-2024-08-06
-
Edit format
:
diff
-
Commit hash
:
09ee197-dirty
-
Pass rate 1
:
4.9
-
Pass rate 2
:
23.1
-
Pass num 1
:
11
-
Pass num 2
:
52
-
格式正确的百分比
:
94.2
-
Error outputs
:
21
-
Num malformed responses
:
21
-
Num with malformed responses
:
13
-
User asks
:
65
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
3
-
Total tests
:
225
-
Command
:
aider --model gpt-4o-2024-08-06
-
Date
:
2024-12-30
-
Versions
:
0.70.1.dev
-
Seconds per case
:
16.0
-
Total cost
:
7.0286
|
|
gemini-2.0-flash-exp |
22.2%
|
|
aider --model gemini/gemini-2.0-flash-exp |
100.0% |
whole |
-
Dirname
:
2024-12-22-20-08-13--gemini-2.0-flash-exp-polyglot-whole
-
Test cases
:
225
-
Model
:
gemini-2.0-flash-exp
-
Edit format
:
whole
-
Commit hash
:
b1bc2f8
-
Pass rate 1
:
11.6
-
Pass rate 2
:
22.2
-
Pass num 1
:
26
-
Pass num 2
:
50
-
格式正确的百分比
:
100.0
-
Error outputs
:
1
-
Num malformed responses
:
0
-
Num with malformed responses
:
0
-
User asks
:
9
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
8
-
Total tests
:
225
-
Command
:
aider --model gemini/gemini-2.0-flash-exp
-
Date
:
2024-12-22
-
Versions
:
0.69.2.dev
-
Seconds per case
:
12.2
-
Total cost
:
0.0
|
|
qwen-max-2025-01-25 |
21.8%
|
|
OPENAI_API_BASE=https://dashscope-intl.aliyuncs.com/compatible-mode/v1 aider --model openai/qwen-max-2025-01-25 |
90.2% |
diff |
-
Dirname
:
2025-01-28-16-00-03--qwen-max-2025-01-25-polyglot-diff
-
Test cases
:
225
-
Model
:
qwen-max-2025-01-25
-
Edit format
:
diff
-
Commit hash
:
ae7d459
-
Pass rate 1
:
9.3
-
Pass rate 2
:
21.8
-
Pass num 1
:
21
-
Pass num 2
:
49
-
格式正确的百分比
:
90.2
-
Error outputs
:
46
-
Num malformed responses
:
44
-
Num with malformed responses
:
22
-
User asks
:
23
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
9
-
Total tests
:
225
-
Command
:
OPENAI_API_BASE=https://dashscope-intl.aliyuncs.com/compatible-mode/v1 aider --model openai/qwen-max-2025-01-25
-
Date
:
2025-01-28
-
Versions
:
0.72.4.dev
-
Seconds per case
:
39.5
|
|
QwQ-32B |
20.9%
|
|
aider --model fireworks_ai/accounts/fireworks/models/qwq-32b |
67.6% |
diff |
-
Dirname
:
2025-03-06-17-40-24--qwq32b-diff-temp-topp-ex-sys-remind-user-for-real
-
Test cases
:
225
-
Model
:
QwQ-32B
-
Edit format
:
diff
-
Commit hash
:
51d118f-dirty
-
Pass rate 1
:
8.0
-
Pass rate 2
:
20.9
-
Pass num 1
:
18
-
Pass num 2
:
47
-
格式正确的百分比
:
67.6
-
Error outputs
:
145
-
Num malformed responses
:
143
-
Num with malformed responses
:
73
-
User asks
:
17
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
4
-
Total tests
:
225
-
Command
:
aider --model fireworks_ai/accounts/fireworks/models/qwq-32b
-
Date
:
2025-03-06
-
Versions
:
0.75.3.dev
-
Seconds per case
:
228.6
-
Total cost
:
0.0
|
|
gemini-2.0-flash-thinking-exp-01-21 |
18.2%
|
|
aider --model gemini/gemini-2.0-flash-thinking-exp-01-21 |
77.8% |
diff |
-
Dirname
:
2025-01-21-22-51-49--gemini-2.0-flash-thinking-exp-01-21-polyglot-diff
-
Test cases
:
225
-
Model
:
gemini-2.0-flash-thinking-exp-01-21
-
Edit format
:
diff
-
Commit hash
:
843720a
-
Pass rate 1
:
5.8
-
Pass rate 2
:
18.2
-
Pass num 1
:
13
-
Pass num 2
:
41
-
格式正确的百分比
:
77.8
-
Error outputs
:
182
-
Num malformed responses
:
180
-
Num with malformed responses
:
50
-
User asks
:
26
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
2
-
Test timeouts
:
7
-
Total tests
:
225
-
Command
:
aider --model gemini/gemini-2.0-flash-thinking-exp-01-21
-
Date
:
2025-01-21
-
Versions
:
0.72.2.dev
-
Seconds per case
:
24.2
-
Total cost
:
0.0
|
|
gpt-4o-2024-11-20 |
18.2%
|
$6.74
|
aider --model gpt-4o-2024-11-20 |
95.1% |
diff |
-
Dirname
:
2024-12-30-20-57-12--gpt-4o-2024-11-20-ex-as-sys
-
Test cases
:
225
-
Model
:
gpt-4o-2024-11-20
-
Edit format
:
diff
-
Commit hash
:
09ee197-dirty
-
Pass rate 1
:
4.9
-
Pass rate 2
:
18.2
-
Pass num 1
:
11
-
Pass num 2
:
41
-
格式正确的百分比
:
95.1
-
Error outputs
:
12
-
Num malformed responses
:
12
-
Num with malformed responses
:
11
-
User asks
:
53
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
12
-
Total tests
:
225
-
Command
:
aider --model gpt-4o-2024-11-20
-
Date
:
2024-12-30
-
Versions
:
0.70.1.dev
-
Seconds per case
:
12.1
-
Total cost
:
6.7351
|
|
DeepSeek Chat V2.5 |
17.8%
|
$0.51
|
aider --model deepseek/deepseek-chat |
92.9% |
diff |
-
Dirname
:
2024-12-21-20-56-21--polyglot-deepseek-diff
-
Test cases
:
225
-
Model
:
DeepSeek Chat V2.5
-
Edit format
:
diff
-
Commit hash
:
a755079-dirty
-
Pass rate 1
:
5.3
-
Pass rate 2
:
17.8
-
Pass num 1
:
12
-
Pass num 2
:
40
-
格式正确的百分比
:
92.9
-
Error outputs
:
42
-
Num malformed responses
:
37
-
Num with malformed responses
:
16
-
User asks
:
23
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
5
-
Test timeouts
:
5
-
Total tests
:
225
-
Command
:
aider --model deepseek/deepseek-chat
-
Date
:
2024-12-21
-
Versions
:
0.69.2.dev
-
Seconds per case
:
184.0
-
Total cost
:
0.5101
|
|
Qwen2.5-Coder-32B-Instruct |
16.4%
|
|
aider --model openai/Qwen2.5-Coder-32B-Instruct |
99.6% |
whole |
-
Dirname
:
2024-12-26-00-55-20--Qwen2.5-Coder-32B-Instruct
-
Test cases
:
225
-
Model
:
Qwen2.5-Coder-32B-Instruct
-
Edit format
:
whole
-
Commit hash
:
b51768b0
-
Pass rate 1
:
4.9
-
Pass rate 2
:
16.4
-
Pass num 1
:
11
-
Pass num 2
:
37
-
格式正确的百分比
:
99.6
-
Error outputs
:
1
-
Num malformed responses
:
1
-
Num with malformed responses
:
1
-
User asks
:
33
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
6
-
Total tests
:
225
-
Command
:
aider --model openai/Qwen2.5-Coder-32B-Instruct
-
Date
:
2024-12-26
-
Versions
:
0.69.2.dev
-
Seconds per case
:
42.0
-
Total cost
:
0.0
|
|
Llama 4 Maverick |
15.6%
|
|
aider --model nvidia_nim/meta/llama-4-maverick-17b-128e-instruct |
99.1% |
whole |
-
Dirname
:
2025-04-06-08-39-52--llama-4-maverick-17b-128e-instruct-polyglot-whole
-
Test cases
:
225
-
Model
:
Llama 4 Maverick
-
Edit format
:
whole
-
Commit hash
:
9445a31
-
Pass rate 1
:
4.4
-
Pass rate 2
:
15.6
-
Pass num 1
:
10
-
Pass num 2
:
35
-
格式正确的百分比
:
99.1
-
Error outputs
:
12
-
Num malformed responses
:
2
-
Num with malformed responses
:
2
-
User asks
:
248
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
4
-
Total tests
:
225
-
Command
:
aider --model nvidia_nim/meta/llama-4-maverick-17b-128e-instruct
-
Date
:
2025-04-06
-
Versions
:
0.81.2.dev
-
Seconds per case
:
20.5
-
Total cost
:
0.0
|
|
yi-lightning |
12.9%
|
|
aider --model openai/yi-lightning |
92.9% |
whole |
-
Dirname
:
2024-12-23-01-11-56--yi-test
-
Test cases
:
225
-
Model
:
yi-lightning
-
Edit format
:
whole
-
Commit hash
:
2b1625e
-
Pass rate 1
:
5.8
-
Pass rate 2
:
12.9
-
Pass num 1
:
13
-
Pass num 2
:
29
-
格式正确的百分比
:
92.9
-
Error outputs
:
87
-
Num malformed responses
:
72
-
Num with malformed responses
:
16
-
User asks
:
107
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
6
-
Total tests
:
225
-
Command
:
aider --model openai/yi-lightning
-
Date
:
2024-12-23
-
Versions
:
0.69.2.dev
-
Seconds per case
:
146.7
-
Total cost
:
0.0
|
|
command-a-03-2025-quality |
12.0%
|
|
OPENAI_API_BASE=https://api.cohere.ai/compatibility/v1 aider --model openai/command-a-03-2025-quality |
99.6% |
whole |
-
Dirname
:
2025-03-14-23-40-00--cmda-quality-whole2
-
Test cases
:
225
-
Model
:
command-a-03-2025-quality
-
Edit format
:
whole
-
Commit hash
:
a1aa63f
-
Pass rate 1
:
2.2
-
Pass rate 2
:
12.0
-
Pass num 1
:
5
-
Pass num 2
:
27
-
格式正确的百分比
:
99.6
-
Error outputs
:
2
-
Num malformed responses
:
1
-
Num with malformed responses
:
1
-
User asks
:
215
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
4
-
Total tests
:
225
-
Command
:
OPENAI_API_BASE=https://api.cohere.ai/compatibility/v1 aider --model openai/command-a-03-2025-quality
-
Date
:
2025-03-14
-
Versions
:
0.77.1.dev
-
Seconds per case
:
85.1
-
Total cost
:
0.0
|
|
Codestral 25.01 |
11.1%
|
$1.98
|
aider --model mistral/codestral-latest |
100.0% |
whole |
-
Dirname
:
2025-01-13-18-17-25--codestral-whole2
-
Test cases
:
225
-
Model
:
Codestral 25.01
-
Edit format
:
whole
-
Commit hash
:
0cba898-dirty
-
Pass rate 1
:
4.0
-
Pass rate 2
:
11.1
-
Pass num 1
:
9
-
Pass num 2
:
25
-
格式正确的百分比
:
100.0
-
Error outputs
:
0
-
Num malformed responses
:
0
-
Num with malformed responses
:
0
-
User asks
:
47
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
4
-
Total tests
:
225
-
Command
:
aider --model mistral/codestral-latest
-
Date
:
2025-01-13
-
Versions
:
0.71.2.dev
-
Seconds per case
:
9.3
-
Total cost
:
1.9834
|
|
openhands-lm-32b-v0.1 |
10.2%
|
|
aider --model openrouter/all-hands/openhands-lm-32b-v0.1 |
95.1% |
whole |
-
Dirname
:
2025-04-19-14-43-04--o4-mini-patch
-
Test cases
:
225
-
Model
:
openhands-lm-32b-v0.1
-
Edit format
:
whole
-
Commit hash
:
c08336f
-
Pass rate 1
:
4.0
-
Pass rate 2
:
10.2
-
Pass num 1
:
9
-
Pass num 2
:
23
-
格式正确的百分比
:
95.1
-
Error outputs
:
55
-
Num malformed responses
:
41
-
Num with malformed responses
:
11
-
User asks
:
166
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
11
-
Total tests
:
225
-
Command
:
aider --model openrouter/all-hands/openhands-lm-32b-v0.1
-
Date
:
2025-04-19
-
Versions
:
0.82.2.dev
-
Seconds per case
:
195.6
-
Total cost
:
0.0
|
|
gpt-4.1-nano |
8.9%
|
$0.43
|
aider --model gpt-4.1-nano |
94.2% |
whole |
-
Dirname
:
2025-04-14-22-46-01--gpt41nano-diff
-
Test cases
:
225
-
Model
:
gpt-4.1-nano
-
Edit format
:
whole
-
Commit hash
:
71d1591-dirty
-
Pass rate 1
:
3.1
-
Pass rate 2
:
8.9
-
Pass num 1
:
7
-
Pass num 2
:
20
-
格式正确的百分比
:
94.2
-
Error outputs
:
20
-
Num malformed responses
:
20
-
Num with malformed responses
:
13
-
User asks
:
316
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
8
-
Total tests
:
225
-
Command
:
aider --model gpt-4.1-nano
-
Date
:
2025-04-14
-
Versions
:
0.81.4.dev
-
Seconds per case
:
12.0
-
Total cost
:
0.4281
|
|
Qwen2.5-Coder-32B-Instruct |
8.0%
|
|
aider --model openai/Qwen/Qwen2.5-Coder-32B-Instruct # via hyperbolic |
71.6% |
diff |
-
Dirname
:
2024-12-22-13-22-32--polyglot-qwen-diff
-
Test cases
:
225
-
Model
:
Qwen2.5-Coder-32B-Instruct
-
Edit format
:
diff
-
Commit hash
:
6d7e8be-dirty
-
Pass rate 1
:
4.4
-
Pass rate 2
:
8.0
-
Pass num 1
:
10
-
Pass num 2
:
18
-
格式正确的百分比
:
71.6
-
Error outputs
:
158
-
Num malformed responses
:
148
-
Num with malformed responses
:
64
-
User asks
:
132
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
2
-
Total tests
:
225
-
Command
:
aider --model openai/Qwen/Qwen2.5-Coder-32B-Instruct # via hyperbolic
-
Date
:
2024-12-22
-
Versions
:
0.69.2.dev
-
Seconds per case
:
84.4
-
Total cost
:
0.0
|
|
gemma-3-27b-it |
4.9%
|
|
aider --model openrouter/google/gemma-3-27b-it |
100.0% |
whole |
-
Dirname
:
2025-03-15-01-21-24--gemma3-27b-or
-
Test cases
:
225
-
Model
:
gemma-3-27b-it
-
Edit format
:
whole
-
Commit hash
:
fd21f51-dirty
-
Pass rate 1
:
1.8
-
Pass rate 2
:
4.9
-
Pass num 1
:
4
-
Pass num 2
:
11
-
格式正确的百分比
:
100.0
-
Error outputs
:
3
-
Num malformed responses
:
0
-
Num with malformed responses
:
0
-
User asks
:
181
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
1
-
Test timeouts
:
3
-
Total tests
:
225
-
Command
:
aider --model openrouter/google/gemma-3-27b-it
-
Date
:
2025-03-15
-
Versions
:
0.77.1.dev
-
Seconds per case
:
79.7
-
Total cost
:
0.0
|
|
gpt-4o-mini-2024-07-18 |
3.6%
|
$0.32
|
aider --model gpt-4o-mini-2024-07-18 |
100.0% |
whole |
-
Dirname
:
2024-12-21-18-41-18--polyglot-gpt-4o-mini
-
Test cases
:
225
-
Model
:
gpt-4o-mini-2024-07-18
-
Edit format
:
whole
-
Commit hash
:
a755079-dirty
-
Pass rate 1
:
0.9
-
Pass rate 2
:
3.6
-
Pass num 1
:
2
-
Pass num 2
:
8
-
格式正确的百分比
:
100.0
-
Error outputs
:
0
-
Num malformed responses
:
0
-
Num with malformed responses
:
0
-
User asks
:
36
-
Lazy comments
:
0
-
Syntax errors
:
0
-
Indentation errors
:
0
-
Exhausted context windows
:
0
-
Test timeouts
:
3
-
Total tests
:
225
-
Command
:
aider --model gpt-4o-mini-2024-07-18
-
Date
:
2024-12-21
-
Versions
:
0.69.2.dev
-
Seconds per case
:
17.3
-
Total cost
:
0.3236
|