INDEX
Explanations
This neuron does not activate for any tokens (i.e., it detects nothing).
New Auto-Interp
Negative Logits
soaking
-0.08
doesn
-0.07
Simon
-0.07
áp
-0.07
Christopher
-0.07
/St
-0.06
_Com
-0.06
pack
-0.06
Going
-0.06
Pack
-0.06
POSITIVE LOGITS
대한
0.07
�
0.07
sahibi
0.07
(day
0.06
ж
0.06
đ
0.06
більш
0.06
دوست
0.06
iliar
0.06
0.06
Activations Density 0.026%