INDEX
Explanations
This neuron specifically activates on the word “other.”
New Auto-Interp
Negative Logits
porno
-0.07
Э
-0.07
ecast
-0.07
realloc
-0.06
ishly
-0.06
casting
-0.06
制作
-0.06
Reflect
-0.06
la
-0.06
�
-0.06
POSITIVE LOGITS
other
0.08
�
0.08
WithIdentifier
0.07
.Regular
0.07
++,
0.07
.sidebar
0.06
loophole
0.06
.Automation
0.06
_PROM
0.06
-other
0.06
Activations Density 0.022%