INDEX
Explanations
comparisons indicating superiority or excellence
comparisons indicating superiority or preference
New Auto-Interp
Negative Logits
uto
-0.74
urther
-0.71
ALE
-0.68
ERN
-0.65
eeper
-0.65
ensor
-0.64
iosyn
-0.64
ango
-0.64
uria
-0.64
imb
-0.63
POSITIVE LOGITS
anybody
0.83
anything
0.83
anyone
0.80
ever
0.80
usual
0.79
useless
0.76
ours
0.73
average
0.71
placebo
0.71
average
0.71
Activations Density 0.091%