INDEX
Explanations
Japanese honorifics
The neuron activates on Japanese polite honorific prefixes (the “お” and “ご” often used to show respect).
New Auto-Interp
Negative Logits
Kenn
-0.08
Better
-0.07
売
-0.07
murderous
-0.07
survey
-0.07
_Show
-0.06
_upgrade
-0.06
Nh
-0.06
dung
-0.06
marshall
-0.06
POSITIVE LOGITS
_stock
0.07
otom
0.07
하신
0.06
.custom
0.06
하시
0.06
�
0.06
(![
0.06
výši
0.06
zip
0.06
::::::::::::::::::::::::::::::::
0.06
Activations Density 0.009%