INDEX
Explanations
mentions of physical or emotional pain
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1034
+0.13
0.5%
1691
+0.11
0.4%
1350
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
203
+0.13
0.02
468
+0.11
0.02
1691
+0.11
0.02
Negative Logits
Pizarro
-0.47
Lehman
-0.43
Alain
-0.43
Swanson
-0.43
Alain
-0.42
Sar
-0.41
Delano
-0.41
Vog
-0.41
Cechy
-0.40
Coppola
-0.40
POSITIVE LOGITS
hurt
1.17
Hurt
1.16
Hurt
1.01
hurt
1.00
hurts
0.93
hurting
0.91
Hurts
0.89
HUR
0.85
Hur
0.67
hurtful
0.65
Activations Density 0.067%