INDEX
Explanations
mentions of countries and political situations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1984
+0.12
0.4%
1741
+0.11
0.3%
1177
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1984
+0.12
0.07
1654
+0.11
0.06
849
+0.11
0.06
Negative Logits
tramont
-0.98
ivi
-0.90
circon
-0.90
territo
-0.85
liev
-0.85
ordina
-0.83
allarg
-0.83
erec
-0.81
parati
-0.80
umbro
-0.80
POSITIVE LOGITS
'
0.79
’
0.79
itself
0.64
‘
0.53
awaiter
0.52
naphthal
0.52
India
0.50
Ireland
0.49
Russia
0.49
India
0.48
Activations Density 0.252%