INDEX
Explanations
activities or states of being related to personal experiences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
23
+0.27
1.5%
31
+0.12
0.7%
435
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
149
+0.27
0.04
384
+0.12
0.03
346
+0.11
0.01
Negative Logits
↵
-4.60
↵
-4.60
-4.60
<|outofrange|>
-4.60
-4.60
-4.60
<|outofrange|>
-4.60
↵
-4.60
↵
-4.60
↵
-4.60
POSITIVE LOGITS
rVert
1.92
ITED
1.73
opically
1.71
?"
1.68
ized
1.67
leftarrow
1.64
suspicion
1.63
now
1.63
rceil
1.62
ally
1.61
Activations Density 0.216%