INDEX
Explanations
mentions of China and related terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.28
1.4%
1124
+0.15
0.7%
1103
+0.14
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1103
+0.28
0.04
1124
+0.15
0.03
395
+0.14
0.03
Negative Logits
<bos>
-2.16
ⓧ
-0.66
xiu
-0.65
guang
-0.64
-0.62
qian
-0.62
xuan
-0.60
<?
-0.59
anyuan
-0.57
sheng
-0.56
POSITIVE LOGITS
Kün
1.05
China
1.02
Bartholo
1.01
Khart
0.99
China
0.99
Chines
0.99
Schrö
0.97
china
0.95
Tarragona
0.94
chinese
0.91
Activations Density 0.060%