INDEX
Explanations
mentions of countries or nationalities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.12
0.4%
1967
+0.11
0.4%
1741
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1512
+0.12
0.04
939
+0.11
0.05
1048
+0.11
0.03
Negative Logits
kahit
-0.60
ļ
-0.58
adicionais
-0.58
NOWLED
-0.58
proszę
-0.58
froh
-0.55
ypeł
-0.55
mī
-0.54
izvē
-0.54
spania
-0.53
POSITIVE LOGITS
liberality
0.71
Kün
0.70
Karsten
0.69
Schrö
0.69
Katrin
0.65
implacable
0.64
ingrat
0.64
Mathilde
0.64
Henk
0.63
Epif
0.63
Activations Density 0.174%