INDEX
Explanations
negations and negative phrases
New Auto-Interp
Negative Logits
overy
-0.18
ayo
-0.16
uzzi
-0.15
uze
-0.14
ogens
-0.14
SSF
-0.13
loth
-0.13
etty
-0.13
ughter
-0.13
Quad
-0.13
POSITIVE LOGITS
outlet
0.21
sale
0.19
-sale
0.18
cheap
0.18
online
0.18
online
0.17
Outlet
0.17
discount
0.17
Sale
0.16
agli
0.16
Activations Density 0.033%