INDEX
Explanations
Dialogues
The neuron activates on the placeholder tokens for character names (e.g. “NAME_1”, “NAME_4”) in the text.
New Auto-Interp
Negative Logits
MK
-0.08
385
-0.07
humiliating
-0.07
pře
-0.07
--------------↵
-0.07
ترة
-0.06
Success
-0.06
Menu
-0.06
-0.06
werp
-0.06
POSITIVE LOGITS
direccion
0.06
ync
0.06
synchronize
0.06
�
0.06
BOOL
0.06
difficile
0.06
adr
0.06
ifikasi
0.06
íl
0.06
<Resource
0.06
Activations Density 0.027%