INDEX
Explanations
phrases related to permission or authorization
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.13
0.7%
260
+0.13
0.7%
472
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
231
+0.13
0.03
260
+0.13
0.03
183
+0.12
0.03
Negative Logits
hey
-1.65
feb
-1.64
CSF
-1.59
uld
-1.59
hend
-1.54
NT
-1.50
MSO
-1.50
ftime
-1.49
Prefab
-1.49
ubert
-1.49
POSITIVE LOGITS
ĻĤ
2.62
ħ
2.49
ģ
2.33
ī
2.21
ĺ
2.20
Ĵ
2.19
ĥ
2.17
↵
2.17
↵↵↵
2.17
↵↵
2.17
Activations Density 0.651%