INDEX
Explanations
political promises and pledges
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
168
+0.08
0.2%
991
+0.08
0.2%
950
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
530
+0.08
0.02
247
+0.08
0.03
1155
+0.07
0.04
Negative Logits
reluct
-1.59
fta
-1.58
increa
-1.55
affor
-1.54
stockholm
-1.53
ftu
-1.53
desir
-1.50
disagre
-1.50
purcha
-1.50
effe
-1.48
POSITIVE LOGITS
promised
0.88
promise
0.85
promises
0.78
never
0.71
pledged
0.71
vowed
0.70
promise
0.67
guarantee
0.67
pledge
0.67
committed
0.65
Activations Density 0.182%