INDEX
Explanations
phrases related to political statements and actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1177
+0.11
0.3%
1415
+0.09
0.3%
1328
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1415
+0.11
0.04
1328
+0.09
0.05
1919
+0.08
0.05
Negative Logits
Intere
-1.66
guarante
-1.66
fta
-1.61
increa
-1.57
purcha
-1.53
vété
-1.53
triomphe
-1.53
»>
-1.53
matel
-1.51
encomp
-1.50
POSITIVE LOGITS
be
0.73
complexContent
0.70
Garantía
0.70
Väl
0.69
Viited
0.69
للاسماء
0.67
Gobierno
0.66
continue
0.66
المصادر
0.65
stay
0.65
Activations Density 0.258%