Phil
phil111
AI & ML interests
None yet
Organizations
None yet
Doesn't stop thinking.
๐
1
9
#3 opened 2 months ago
by
phil111
Impressive Broad Knowledge
๐
๐
5
8
#12 opened 4 months ago
by
phil111
This just trades general performance for domain specific gains.
๐ฅ
๐
16
11
#3 opened 4 months ago
by
phil111
Please stop blindly trusting and reporting Alibaba's scores.
๐
7
2
#1 opened 4 months ago
by
phil111
Weird responses
12
#10 opened 4 months ago
by
vparth7
Gemma A3B
๐
6
13
#3 opened 4 months ago
by
Maria99934
gpt-oss is actually good. even on less common benchmark
๐ค
๐
7
2
#109 opened 4 months ago
by
groupfairnessllm
model quality issues
5
#92 opened 4 months ago
by
TheBigBlockPC
Terrible instruction following
๐
1
4
#3 opened 4 months ago
by
denisalpino
4b model with an 84.2 MMLU-Redux score?
๐ค
3
1
#2 opened 4 months ago
by
phil111
This model is unbelievably ignorant.
โ
๐
40
15
#14 opened 4 months ago
by
phil111
Knowledge limitations
๐
2
5
#25 opened 4 months ago
by
hexess
An Improvement, But Q3 30b Still Has Very Little General Knowledge
โค๏ธ
๐
3
11
#2 opened 4 months ago
by
phil111
Test Scores Can Be Misleading
๐
1
8
#8 opened 4 months ago
by
phil111
More Knowledge, But Hard To Extract
๐
1
#29 opened 4 months ago
by
phil111
The SimpleQA score of the model is WAY off.
๐ฅ
4
3
#2 opened 5 months ago
by
phil111
Qwen3 is great, but could be better.
๐
9
25
#18 opened 7 months ago
by
phil111
SimpleQA jumped from 12.2 to 54.3?
๐ฅ
๐ง
22
25
#4 opened 5 months ago
by
phil111
SimpleQA score?
#6 opened 5 months ago
by
phil111
That SimpleQA score looks too good to be true.
๐
12
19
#1 opened 5 months ago
by
phil111