INDEX
Explanations
words expressing feelings of disdain, scorn, or contempt
New Auto-Interp
Negative Logits
Bram
-0.15
커
-0.15
ضÛĮ
-0.14
746
-0.14
Kern
-0.14
_IGNORE
-0.14
Penn
-0.14
queeze
-0.14
endance
-0.13
åŃIJãģ¯
-0.13
POSITIVE LOGITS
ible
0.16
sure
0.16
anka
0.16
ky
0.15
agini
0.15
agas
0.15
kle
0.15
akash
0.14
ünchen
0.14
mann
0.14
Activations Density 0.011%