INDEX
Explanations
This neuron activates on explicit references to self‐stimulation (e.g., the word “masturbate”).
New Auto-Interp
Negative Logits
_asset
-0.07
kommt
-0.06
seminal
-0.06
Dome
-0.06
byli
-0.06
Sharma
-0.06
розум
-0.06
Weapon
-0.06
ражд
-0.05
스테
-0.05
POSITIVE LOGITS
ynchron
0.08
onia
0.07
activ
0.06
فاق
0.06
Governor
0.06
marsh
0.06
Рез
0.06
mie
0.06
.TextView
0.06
ikhail
0.06
Activations Density 0.108%