INDEX
Explanations
The neuron activates on mentions of “Soviet” (especially “Soviet Union”).
New Auto-Interp
Negative Logits
Gl
-0.07
luder
-0.07
दल
-0.07
kn
-0.06
Archbishop
-0.06
신청
-0.06
ка
-0.06
elephant
-0.06
,address
-0.06
Golden
-0.06
POSITIVE LOGITS
Soviet
0.14
actual
0.08
essian
0.08
Soviets
0.08
revolving
0.07
совет
0.07
oriented
0.07
Vote
0.07
Sov
0.07
vv
0.07
Activations Density 0.004%