INDEX
Explanations
adult content
The neuron fires on explicit masturbation verbs (e.g. “jerk,” “jerking,” “jerked”).
New Auto-Interp
Negative Logits
Stad
-0.07
dio
-0.06
bak
-0.06
_relations
-0.06
mayı
-0.06
ransom
-0.06
gratuiti
-0.06
_UTIL
-0.06
άλλ
-0.06
LED
-0.06
POSITIVE LOGITS
jerk
0.09
jer
0.07
snap
0.07
Perc
0.07
(^
0.07
shocks
0.07
(rng
0.07
ARK
0.06
vious
0.06
er
0.06
Activations Density 0.002%