INDEX
Explanations
conditional statements expressing hypothetical scenarios
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
258
+0.14
0.8%
478
+0.13
0.7%
486
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
486
+0.14
0.05
236
+0.13
0.06
481
+0.13
0.05
Negative Logits
»¿
-2.60
Ļª
-2.42
ĻĤ
-2.27
ĩ
-2.26
Ĩ
-2.08
ĭ
-2.08
ľĵ
-2.04
¡
-2.03
¼
-2.00
Ħ
-1.97
POSITIVE LOGITS
iox
1.95
icar
1.61
duplicate
1.53
untimely
1.48
zia
1.43
dye
1.42
sacrifice
1.41
prob
1.39
ffen
1.38
ismiss
1.36
Activations Density 0.189%