INDEX
Explanations
terms related to political figures, government actions, and historical events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1056
+0.10
0.3%
1639
+0.09
0.3%
1512
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1056
+0.10
0.07
1784
+0.09
0.06
1762
+0.08
0.05
Negative Logits
kram
-1.10
gend
-0.99
lele
-0.99
Anm
-0.97
Nö
-0.96
utop
-0.96
teras
-0.90
duk
-0.88
mef
-0.88
„,
-0.88
POSITIVE LOGITS
by
0.63
viewWillAppear
0.53
alnız
0.52
poveznice
0.51
oleh
0.51
epsfig
0.51
Apesar
0.49
logements
0.49
by
0.48
dachshund
0.48
Activations Density 0.331%