INDEX
Explanations
phrases expressing varying degrees of comparison
New Auto-Interp
Negative Logits
pps
-0.16
kind
-0.15
bbw
-0.14
Scho
-0.14
è¨İ
-0.14
gres
-0.14
_lazy
-0.14
tighter
-0.14
iaux
-0.14
ess
-0.13
POSITIVE LOGITS
acic
0.16
eyh
0.15
atta
0.15
Trou
0.14
imli
0.14
terior
0.14
ules
0.14
kee
0.14
urr
0.14
.CONTENT
0.14
Activations Density 0.054%