INDEX
Explanations
This neuron specifically fires on occurrences of the standalone word “Spell.”
New Auto-Interp
Negative Logits
ac
-0.08
Jacobs
-0.07
bicy
-0.07
_nc
-0.07
torso
-0.07
cou
-0.07
국내
-0.07
Kim
-0.07
MAK
-0.07
Davis
-0.07
POSITIVE LOGITS
Spell
0.13
spell
0.12
Spell
0.10
pell
0.09
spell
0.08
spells
0.08
spelling
0.08
SPELL
0.08
SSL
0.08
ป
0.07
Activations Density 0.005%