INDEX
Explanations
This neuron is effectively “dead” (it never activates for any input tokens).
New Auto-Interp
Negative Logits
endeavors
-0.08
decorators
-0.07
fz
-0.06
orent
-0.06
fsm
-0.06
izen
-0.06
Lightweight
-0.06
ологіч
-0.06
fir
-0.06
switched
-0.06
POSITIVE LOGITS
_FAILED
0.07
ΡΑ
0.07
�
0.06
maximizing
0.06
―
0.06
складу
0.06
()");↵
0.06
보면
0.06
*:
0.06
úspěš
0.06
Activations Density 0.076%