INDEX
Explanations
phrases related to politics and government, including names of political figures, policy proposals, and official statements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
658
+0.15
0.5%
478
+0.13
0.4%
1937
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
658
+0.15
0.08
1919
+0.13
0.06
862
+0.10
0.04
Negative Logits
malheure
-1.02
étrang
-1.00
carrefour
-1.00
plong
-0.97
malheureux
-0.93
prétend
-0.88
héro
-0.86
cahier
-0.84
hcm
-0.83
Miscell
-0.82
POSITIVE LOGITS
aware
0.73
able
0.72
glad
0.71
ready
0.71
willing
0.71
afraid
0.68
gonna
0.68
proud
0.68
unable
0.67
pleased
0.67
Activations Density 0.249%