INDEX
Explanations
the term "Department" in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
182
+0.13
0.8%
343
+0.13
0.7%
316
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
145
+0.13
0.01
343
+0.13
0.01
316
+0.12
0.01
Negative Logits
latter
-1.89
usual
-1.62
)">
-1.59
discouraged
-1.58
word
-1.55
verb
-1.50
threatened
-1.49
forbidden
-1.47
mercy
-1.45
questioned
-1.44
POSITIVE LOGITS
¾
2.12
yards
2.00
»¿
1.99
º
1.77
aho
1.73
mans
1.71
İ
1.69
rooms
1.69
yard
1.66
aceut
1.65
Activations Density 0.043%