INDEX
Explanations
words related to politics, government, and specific political figures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
544
+0.14
0.5%
1296
+0.13
0.5%
479
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1296
+0.14
0.05
544
+0.13
0.04
1994
+0.13
0.04
Negative Logits
swarovski
-1.06
ecru
-1.05
hairc
-1.00
suscep
-0.93
unlaw
-0.92
bandeau
-0.92
reluct
-0.88
velour
-0.88
pyjama
-0.87
ingrat
-0.86
POSITIVE LOGITS
Clinton
1.62
Clinton
1.48
CLIN
0.97
Clint
0.93
CLIN
0.88
Clint
0.82
Hillary
0.81
Hillary
0.80
cl
0.79
cl
0.78
Activations Density 0.054%