INDEX
Explanations
personal pronouns (him, me, us) followed by actions or movements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.11
0.3%
674
+0.10
0.3%
1150
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1081
+0.11
0.04
284
+0.10
0.04
1085
+0.09
0.03
Negative Logits
Redacción
-0.60
poveznice
-0.58
junho
-0.55
Pued
-0.55
semblait
-0.55
renova
-0.54
Lleg
-0.54
airpods
-0.54
ajudá
-0.53
bonsoir
-0.53
POSITIVE LOGITS
ioe
0.59
Republics
0.56
Sepp
0.51
emirates
0.48
ropshire
0.47
Thier
0.47
peasantry
0.46
altham
0.46
.
0.45
Colla
0.45
Activations Density 0.168%