INDEX
Explanations
terms related to comparisons or similarities between different entities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
122
+0.08
0.2%
581
+0.07
0.2%
1473
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
814
+0.08
0.02
656
+0.07
0.03
1957
+0.07
0.02
Negative Logits
alkoh
-1.11
lele
-0.96
oner
-0.94
fta
-0.93
nece
-0.92
antik
-0.91
kac
-0.89
igno
-0.89
uhr
-0.88
ert
-0.88
POSITIVE LOGITS
same
0.77
same
0.77
Same
0.73
Same
0.68
SAME
0.64
applies
0.63
الاطلاع
0.54
apply
0.54
SAME
0.53
Similarly
0.53
Activations Density 0.118%