INDEX
Explanations
phrases related to ridicule and mockery
New Auto-Interp
Negative Logits
ka
-0.17
ka
-0.17
Ka
-0.16
ót
-0.15
å¯
-0.15
нÑıÑĤ
-0.15
lor
-0.14
Ka
-0.14
ksi
-0.14
iplinary
-0.14
POSITIVE LOGITS
everything
0.45
every
0.43
EVERY
0.39
Everything
0.39
_every
0.38
Every
0.38
Everything
0.37
everything
0.37
everyone
0.37
Every
0.35
Activations Density 0.057%