INDEX
Explanations
phrases indicating the usefulness and payoff of certain cards or strategies in a game context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.12
0.3%
273
+0.07
0.2%
386
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
273
+0.12
0.03
1428
+0.07
0.02
1411
+0.07
0.03
Negative Logits
effe
-1.27
fta
-1.25
aen
-1.23
ftu
-1.23
thut
-1.20
fte
-1.18
fto
-1.15
alre
-1.15
desir
-1.12
increa
-1.10
POSITIVE LOGITS
later
1.51
future
1.33
later
1.27
Later
1.19
future
1.17
someday
1.15
Later
1.09
eventually
1.09
afterwards
1.07
eventual
1.04
Activations Density 0.467%