INDEX
Explanations
references to countries and international relations, specifically focusing on the United States and its interactions with other nations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1905
+0.14
0.4%
1984
+0.12
0.4%
1978
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1145
+0.14
0.04
1905
+0.12
0.03
1984
+0.11
0.04
Negative Logits
<bos>
-1.33
Rohy
-0.71
لينكات
-0.67
katkı
-0.64
Sklici
-0.63
Eksteraj
-0.61
ксана
-0.61
EconPapers
-0.61
بتاريخ
-0.60
GEBURTSDATUM
-0.60
POSITIVE LOGITS
squa
1.34
mef
1.26
lara
1.25
pleins
1.23
embra
1.20
Manufact
1.20
seiz
1.19
deleter
1.18
Augu
1.17
Eft
1.17
Activations Density 0.064%