INDEX
Explanations
words related to the concept of utilization or usage
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.28
1.6%
376
+0.19
1.1%
69
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
352
+0.28
0.02
12
+0.19
0.01
373
+0.11
0.01
Negative Logits
owski
-1.69
HEP
-1.62
MOESM
-1.57
brahim
-1.51
Publ
-1.45
supplementary
-1.45
GN
-1.44
unnumbered
-1.42
âĺħ
-1.39
ì§
-1.38
POSITIVE LOGITS
ľĵ
2.33
ł
2.22
©
2.16
contradiction
2.01
°
2.00
Ļ
1.95
¨
1.88
Ĺ
1.79
Ķ
1.73
ĺ
1.72
Activations Density 0.730%