INDEX
Explanations
mentions of political leaders or significant government figures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1589
+0.09
0.3%
1001
+0.08
0.2%
1479
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1001
+0.09
0.06
1336
+0.08
0.05
1579
+0.08
0.04
Negative Logits
fta
-1.42
effe
-1.37
ftu
-1.35
thut
-1.35
aen
-1.34
fep
-1.31
fatis
-1.30
mef
-1.29
secon
-1.29
fte
-1.28
POSITIVE LOGITS
himself
1.09
'
0.78
’
0.77
himself
0.74
herself
0.74
Himself
0.71
׳
0.70
cellor
0.69
president
0.67
who
0.66
Activations Density 0.288%