INDEX
Explanations
names of political figures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
303
+0.15
0.7%
1983
+0.14
0.6%
1937
+0.14
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.15
0.05
227
+0.14
0.05
314
+0.14
0.02
Negative Logits
Abril
-0.63
Nein
-0.57
Gör
-0.54
Minha
-0.53
softmax
-0.53
Jawaban
-0.52
Gibt
-0.51
Saiba
-0.51
inflater
-0.51
Agosto
-0.50
POSITIVE LOGITS
fatis
1.06
ftu
1.06
dises
1.01
fta
1.00
fuf
1.00
guarante
0.99
vns
0.95
mépris
0.95
fup
0.95
fep
0.94
Activations Density 0.280%