INDEX
Explanations
expressions of satisfaction or dissatisfaction
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.18
1.0%
380
+0.13
0.7%
9
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
9
+0.18
0.02
105
+0.13
0.01
380
+0.11
0.01
Negative Logits
OE
-1.77
EP
-1.48
aged
-1.40
HN
-1.38
Pacific
-1.36
Ïģγ
-1.36
YE
-1.34
rise
-1.34
hire
-1.30
itti
-1.27
POSITIVE LOGITS
ones
2.01
ely
2.00
edly
1.72
ivable
1.59
otal
1.57
bourg
1.53
eville
1.47
tons
1.44
iculously
1.44
arius
1.38
Activations Density 0.011%