INDEX
Explanations
words related to performance evaluation and comparison
mentions of benchmark-related concepts
New Auto-Interp
Negative Logits
agan
-0.88
oles
-0.77
Else
-0.76
oho
-0.73
frey
-0.73
aina
-0.72
gren
-0.72
artney
-0.71
kn
-0.71
yne
-0.71
POSITIVE LOGITS
benchmark
1.00
benchmarks
0.98
marks
0.77
guiActiveUnfocused
0.70
scores
0.70
mark
0.69
score
0.69
indices
0.68
suite
0.66
osterone
0.66
Activations Density 0.007%