INDEX
Explanations
instances of comparisons or contrasts in complex language
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
240
+0.27
1.6%
156
+0.22
1.3%
369
+0.21
1.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
240
+0.27
0.03
369
+0.22
0.03
125
+0.21
0.04
Negative Logits
teenth
-1.49
Fig
-1.48
documentclass
-1.34
^âĪĴ
-1.32
0000000000000000000000000000000000
-1.29
nolimits
-1.28
trib
-1.27
amusement
-1.27
urane
-1.26
ftime
-1.26
POSITIVE LOGITS
²
2.64
Ĵ
2.58
»
2.52
Ĭ
2.52
º
2.52
¢
2.52
Ľ
2.51
ĸ´
2.50
¸
2.49
Ĺ
2.49
Activations Density 1.430%