INDEX
Explanations
references to abstract concepts or unspecified ideas
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
246
+0.14
0.8%
365
+0.12
0.7%
263
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
246
+0.14
0.04
360
+0.12
0.03
230
+0.11
0.02
Negative Logits
¥
-3.72
Ļª
-3.69
§
-3.21
¬
-3.19
µ
-3.17
Ń
-3.13
·
-3.12
Ī
-3.12
¿½
-3.02
ĺ
-3.01
POSITIVE LOGITS
else
2.83
resembling
2.01
productive
1.87
ELSE
1.81
like
1.73
Else
1.73
positive
1.60
akin
1.58
acidic
1.55
constructive
1.55
Activations Density 0.100%