INDEX
Explanations
instances of the word "move."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.22
1.2%
139
+0.14
0.8%
282
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
139
+0.22
0.04
426
+0.14
0.03
89
+0.11
0.03
Negative Logits
ĥ
-1.70
icum
-1.50
§
-1.49
ĥ½
-1.44
Į
-1.43
dere
-1.39
ľĵ
-1.37
»¿
-1.36
courage
-1.36
hedral
-1.34
POSITIVE LOGITS
able
2.38
ability
1.72
legged
1.67
chip
1.58
manship
1.50
man
1.47
maker
1.47
thm
1.44
plant
1.43
controls
1.43
Activations Density 0.034%