INDEX
Explanations
assertive statements related to military conflicts and international relations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1577
+0.21
0.7%
50
+0.15
0.6%
604
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
939
+0.21
0.26
81
+0.15
0.07
16
+0.14
0.21
Negative Logits
<bos>
-1.40
:)))
-0.81
bandai
-0.75
makita
-0.74
!...
-0.72
pikachu
-0.71
funko
-0.70
occhiali
-0.70
affez
-0.67
vivace
-0.67
POSITIVE LOGITS
Bartholo
0.87
Vaugh
0.86
Glou
0.83
McLaugh
0.77
clergymen
0.76
Middles
0.74
Catharine
0.74
Olof
0.73
Werth
0.70
Kild
0.70
Activations Density 8.538%