INDEX
Explanations
phrases related to historical policies or practices
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.15
0.4%
1328
+0.11
0.3%
1602
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1602
+0.15
0.05
757
+0.11
0.06
1892
+0.09
0.05
Negative Logits
impelled
-0.70
Shakspeare
-0.69
endeavouring
-0.67
McLaugh
-0.66
Daven
-0.63
assailed
-0.63
endeavoured
-0.60
vainly
-0.59
exagger
-0.59
gaily
-0.59
POSITIVE LOGITS
reasons
0.99
reasons
0.88
madeus
0.80
REASONS
0.78
Reasons
0.77
Reasons
0.75
purposes
0.73
raisons
0.72
affez
0.72
divertimento
0.72
Activations Density 0.229%