INDEX
Explanations
terms related to moral superiority and condescension
New Auto-Interp
Negative Logits
estre
-0.19
uddle
-0.16
ÙĬج
-0.15
estro
-0.14
META
-0.14
eurs
-0.14
buster
-0.14
jh
-0.14
iage
-0.14
Pioneer
-0.13
POSITIVE LOGITS
_:*
0.17
EMU
0.15
elly
0.14
613
0.14
CHANT
0.14
Gim
0.14
gate
0.14
Cena
0.14
815
0.14
osten
0.14
Activations Density 0.191%