INDEX
Explanations
references to a boy or boys
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
406
+0.12
0.5%
765
+0.10
0.4%
1413
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
406
+0.12
0.02
765
+0.10
0.02
130
+0.10
0.02
Negative Logits
mme
-0.76
tse
-0.76
tanga
-0.74
sii
-0.70
rong
-0.69
michel
-0.67
fta
-0.67
hek
-0.66
kac
-0.65
kien
-0.64
POSITIVE LOGITS
boy
1.39
boys
1.30
Boy
1.28
boy
1.27
boys
1.24
Boy
1.21
Boys
1.15
BOY
1.13
Boys
1.11
BOYS
1.03
Activations Density 0.054%