INDEX
Explanations
expressions of time and long-term plans or events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
369
+0.27
1.6%
436
+0.17
1.0%
419
+0.16
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
436
+0.27
0.08
369
+0.17
0.03
103
+0.16
0.10
Negative Logits
himself
-1.52
ament
-1.52
eners
-1.38
eu
-1.32
himself
-1.29
kin
-1.24
aya
-1.22
Episode
-1.22
lan
-1.20
hostage
-1.19
POSITIVE LOGITS
↵
2.11
2.11
č↵
2.11
<|outofrange|>
2.11
<|outofrange|>
2.11
↵
2.11
2.11
↵
2.11
č↵
2.11
č↵
2.11
Activations Density 2.940%