INDEX
Explanations
specific phrases indicating locations or contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
441
+0.13
0.7%
316
+0.11
0.7%
364
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
441
+0.13
0.08
316
+0.11
0.06
170
+0.11
0.06
Negative Logits
cheon
-1.54
neck
-1.49
amble
-1.44
SEC
-1.41
oplasma
-1.38
iplex
-1.37
]):
-1.36
icorn
-1.34
uber
-1.33
ouin
-1.31
POSITIVE LOGITS
ı
3.63
Ĩ
3.44
ĥ½
3.43
Ļ
3.41
Ģ
3.40
¹
3.28
¾
3.24
¤
3.16
ħ
3.15
İ
3.10
Activations Density 0.056%