INDEX
Explanations
words related to Germany and political figures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
172
+0.15
0.5%
1520
+0.11
0.4%
1177
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
172
+0.15
0.04
1310
+0.11
0.03
1837
+0.11
0.04
Negative Logits
Städ
-1.37
Schloß
-1.25
Distrikt
-1.12
Ewig
-1.10
Fürst
-1.09
Rathaus
-1.07
Freiw
-1.07
Orgel
-1.05
Leinwand
-1.04
Zeits
-1.04
POSITIVE LOGITS
Confu
1.76
unwarran
1.70
Daven
1.69
increa
1.66
Rine
1.64
McLaugh
1.62
McInt
1.62
Vaugh
1.60
Áng
1.58
Juf
1.58
Activations Density 0.108%