INDEX
Explanations
This neuron detects expressions referring to the “other side” of something (i.e. mentions of going to or being on the opposite side of a place).
New Auto-Interp
Negative Logits
Foo
-0.07
(<
-0.07
climbed
-0.06
.two
-0.06
-number
-0.06
Nora
-0.06
Hayden
-0.06
fv
-0.06
enqueue
-0.06
UPS
-0.06
POSITIVE LOGITS
ymm
0.06
MMI
0.06
바이
0.06
turkey
0.06
modulo
0.06
'})↵
0.06
oader
0.06
datastore
0.06
_sess
0.06
último
0.06
Activations Density 0.005%