INDEX
Explanations
verbs related to taking a certain action or making a particular statement
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
991
+0.14
0.5%
1861
+0.13
0.5%
1805
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
991
+0.14
0.03
1805
+0.13
0.02
1861
+0.12
0.03
Negative Logits
lü
-0.62
Oma
-0.60
panik
-0.59
alkoh
-0.58
kosme
-0.55
Letra
-0.54
induk
-0.54
elektronik
-0.53
rü
-0.53
kemer
-0.53
POSITIVE LOGITS
pose
1.37
poses
1.33
posed
1.29
posing
1.28
Pose
1.07
pose
1.04
poser
0.97
Pose
0.95
poses
0.93
posé
0.92
Activations Density 0.067%