INDEX
Explanations
mentions of political positions and historical events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.20
0.6%
1967
+0.16
0.5%
1741
+0.16
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
588
+0.20
0.04
1343
+0.16
0.04
1274
+0.16
0.03
Negative Logits
parteci
-1.11
minimalis
-0.95
alkoh
-0.88
intende
-0.88
kriminal
-0.86
kön
-0.84
trovo
-0.84
voleva
-0.84
kosme
-0.83
keramik
-0.83
POSITIVE LOGITS
and
0.84
0.80
etc
0.77
pathfinder
0.68
,
0.67
&
0.64
,
0.63
etc
0.62
or
0.61
и
0.60
Activations Density 0.145%