INDEX
Explanations
expressions of subjective opinions or personal experiences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
118
+0.15
0.9%
181
+0.11
0.6%
142
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
118
+0.15
0.19
404
+0.11
0.14
44
+0.10
0.04
Negative Logits
antly
-1.69
another
-1.66
ibilities
-1.47
ICES
-1.47
IGHT
-1.46
antic
-1.46
another
-1.45
withstanding
-1.44
LORD
-1.41
arers
-1.40
POSITIVE LOGITS
↵
2.25
↵
2.25
2.25
č↵
2.25
↵↵
2.25
↵
2.25
2.25
↵
2.25
2.25
↵
2.25
Activations Density 4.215%