INDEX
Explanations
The neuron activates on phrases describing a hero’s mission to save the world (e.g., “save the world from…”).
New Auto-Interp
Negative Logits
/desktop
-0.07
shadow
-0.06
'+'
-0.06
)))))↵
-0.06
terk
-0.06
LIMIT
-0.06
Fakült
-0.06
oter
-0.06
AreaView
-0.06
původ
-0.06
POSITIVE LOGITS
稱
0.07
texting
0.06
elere
0.06
rapped
0.06
ovel
0.06
HZ
0.06
analogy
0.06
neum
0.06
_sn
0.06
layers
0.06
Activations Density 0.011%