INDEX
Explanations
legal terms and operational conditions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
235
+0.14
0.8%
212
+0.13
0.7%
400
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
212
+0.14
0.07
355
+0.13
0.06
400
+0.12
0.03
Negative Logits
ĭ
-2.61
Ń
-2.52
Ļª
-2.29
İ
-2.26
ģ
-2.23
ī
-2.23
ĩ
-2.21
»¿
-2.15
·¸
-2.13
Ĥ
-2.10
POSITIVE LOGITS
APTER
1.51
Cookie
1.42
holds
1.40
ename
1.34
belongs
1.31
OP
1.28
concer
1.28
ague
1.26
ISH
1.25
lies
1.22
Activations Density 0.880%