INDEX
Explanations
phrases related to political and military contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.13
0.5%
1265
+0.11
0.4%
1156
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1892
+0.13
0.06
492
+0.11
0.07
316
+0.11
0.06
Negative Logits
ftu
-1.65
fta
-1.63
aen
-1.56
sappi
-1.54
ftre
-1.54
vns
-1.54
fatis
-1.53
poff
-1.48
hcm
-1.46
desir
-1.46
POSITIVE LOGITS
And
0.97
And
0.88
yes
0.78
yet
0.78
because
0.74
even
0.73
if
0.70
then
0.69
while
0.69
But
0.69
Activations Density 0.140%