INDEX
Explanations
repeated use of the verb "was."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
494
+0.14
0.8%
462
+0.13
0.7%
7
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
32
+0.14
0.14
146
+0.13
0.12
335
+0.12
0.11
Negative Logits
urg
-1.73
RS
-1.72
nd
-1.58
LT
-1.57
GM
-1.52
EV
-1.50
G
-1.50
м
-1.48
mean
-1.47
DD
-1.45
POSITIVE LOGITS
Ļª
3.84
↵
3.49
↵
3.48
↵
3.48
↵
3.48
č↵
3.48
↵↵
3.48
↵
3.48
↵↵
3.48
↵
3.48
Activations Density 0.332%