INDEX
Explanations
political figures and their positions or relationships with other political entities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
227
+0.12
0.4%
604
+0.12
0.3%
1252
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.12
0.06
276
+0.12
0.04
704
+0.07
0.01
Negative Logits
<bos>
-1.23
ัพท์
-0.59
OPHER
-0.55
__*/
-0.51
wideo
-0.51
Kohlen
-0.51
tagHelperRunner
-0.50
entery
-0.50
orrhea
-0.49
iNdEx
-0.49
POSITIVE LOGITS
Juf
1.34
Rine
1.32
Keny
1.20
Bartholo
1.17
McLaugh
1.14
McInt
1.13
Augu
1.13
Righ
1.12
Hez
1.10
Vaugh
1.08
Activations Density 0.410%