INDEX
Explanations
verb phrases related to physical actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.11
0.3%
1385
+0.09
0.2%
690
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
392
+0.11
0.04
736
+0.09
0.06
100
+0.08
0.05
Negative Logits
fto
-0.65
dora
-0.54
jep
-0.52
sii
-0.52
fei
-0.52
fta
-0.51
demokra
-0.51
inder
-0.51
purcha
-0.50
nij
-0.50
POSITIVE LOGITS
opponents
0.77
foes
0.75
enemies
0.75
opponent
0.73
enemy
0.71
getTarget
0.64
*++
0.60
Enemies
0.59
targets
0.59
adversaries
0.59
Activations Density 0.520%