INDEX
Explanations
mentions of a specific country or nationality related to political or social topics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
394
+0.17
0.6%
1150
+0.17
0.5%
478
+0.16
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.17
0.09
964
+0.17
0.06
446
+0.16
0.05
Negative Logits
préc
-0.97
silikon
-0.88
exé
-0.81
duquel
-0.78
représ
-0.78
pama
-0.77
recev
-0.76
kompres
-0.76
kule
-0.76
prét
-0.74
POSITIVE LOGITS
apprehen
0.77
gaily
0.67
indescri
0.67
overjoyed
0.66
luxuriant
0.64
fringed
0.63
vainly
0.62
ⓧ
0.62
loveliness
0.62
boughs
0.62
Activations Density 0.588%