INDEX
Explanations
phrases related to hiding or concealing
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.16
0.8%
874
+0.13
0.6%
991
+0.09
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
851
+0.16
0.03
468
+0.13
0.03
991
+0.09
0.03
Negative Logits
<bos>
-2.94
ⓧ
-0.74
Vegeu
-0.68
/***
-0.68
<?
-0.64
HasColumnType
-0.64
Kontrola
-0.62
-0.61
Юлия
-0.60
got
-0.59
POSITIVE LOGITS
wien
1.28
stockholm
1.24
myn
1.21
bordeaux
1.19
maroc
1.19
dises
1.18
thut
1.18
bayern
1.17
aen
1.17
fua
1.17
Activations Density 0.162%