INDEX
Explanations
keywords related to global politics and international agreements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
604
+0.12
0.4%
1177
+0.10
0.3%
394
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1948
+0.12
0.07
939
+0.10
0.07
1499
+0.09
0.06
Negative Logits
<bos>
-0.84
gagne
-0.56
mérite
-0.54
soigne
-0.53
marte
-0.53
boxe
-0.52
illustre
-0.51
réuss
-0.51
horizont
-0.51
constate
-0.50
POSITIVE LOGITS
unwarran
0.62
coö
0.60
Áng
0.59
Viene
0.59
Darío
0.57
Tampoco
0.57
McLaugh
0.56
souverain
0.56
sovere
0.56
Exteriores
0.56
Activations Density 0.532%