INDEX
Explanations
references to various cities or locations, particularly those with 'San' in their name
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1472
+0.20
0.8%
892
+0.15
0.6%
950
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1472
+0.20
0.05
892
+0.15
0.04
950
+0.14
0.04
Negative Logits
gratify
-0.72
quivering
-0.60
vexed
-0.59
earnestness
-0.58
endeavouring
-0.58
špat
-0.57
kindled
-0.56
moż
-0.56
weariness
-0.56
attainments
-0.55
POSITIVE LOGITS
San
1.42
San
1.41
san
1.22
SAN
1.19
silikon
1.18
exé
1.12
kosme
1.10
utop
1.07
san
1.05
simplif
1.05
Activations Density 0.057%