INDEX
Explanations
humiliation
the main thing this neuron does is detect language expressing humiliation or degradation.
elements of dominance and submission dynamics in sexual relationships.
New Auto-Interp
Negative Logits
Binary
-0.07
ears
-0.06
ydk
-0.06
English
-0.06
Sp
-0.06
binaries
-0.06
-0.06
dling
-0.06
олее
-0.06
raging
-0.06
POSITIVE LOGITS
humiliation
0.10
autom
0.08
humiliating
0.07
поє
0.06
heck
0.06
(=
0.06
Raqqa
0.06
Kul
0.06
Norris
0.06
Techniques
0.06
Activations Density 0.007%