INDEX
Explanations
expressions of personal feelings and introspection
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
501
+0.10
0.3%
860
+0.10
0.3%
1950
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1101
+0.10
0.03
504
+0.10
0.03
1452
+0.10
0.02
Negative Logits
maneu
-0.77
strick
-0.71
attemp
-0.70
lgbt
-0.68
horrend
-0.64
shenan
-0.64
toledo
-0.64
encomp
-0.63
increa
-0.62
resear
-0.61
POSITIVE LOGITS
feel
0.56
ългария
0.55
felt
0.54
feels
0.52
feel
0.50
feeling
0.49
Feel
0.47
sento
0.47
sinto
0.47
πως
0.46
Activations Density 0.125%