INDEX
Explanations
mentions of specific city names with abbreviated versions, such as "St. John's" and "St. Paul"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
554
+0.17
0.7%
397
+0.15
0.6%
411
+0.14
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
397
+0.17
0.04
554
+0.15
0.04
1562
+0.14
0.04
Negative Logits
vogliono
-0.63
<bos>
-0.60
الإنجليزية
-0.59
jsi
-0.57
vanskelig
-0.53
Shetterly
-0.51
svårt
-0.51
facciamo
-0.51
dicono
-0.49
AssemblyCompany
-0.49
POSITIVE LOGITS
St
1.23
St
1.18
SAINT
1.08
Saint
1.05
Saint
1.01
Simult
0.99
rempliss
0.95
McInt
0.92
prétend
0.89
Rodrig
0.89
Activations Density 0.075%