INDEX
Explanations
negative or null indicators in code or technical texts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
121
+0.15
0.8%
355
+0.14
0.8%
99
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
448
+0.15
0.20
316
+0.14
0.18
390
+0.12
0.18
Negative Logits
walks
-1.60
lessly
-1.54
unlocked
-1.53
marketplace
-1.53
refrigerator
-1.47
fut
-1.36
mirrors
-1.35
fridge
-1.32
walk
-1.30
bracket
-1.27
POSITIVE LOGITS
Ĩ
2.21
¿½
2.12
·¸
2.05
Į
1.89
ķ
1.87
Ŀ
1.78
ĨĴ
1.70
¿
1.68
ı
1.64
¾
1.59
Activations Density 0.821%