INDEX
Explanations
words related to specific locations or geographical features
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
87
+0.11
0.6%
264
+0.11
0.6%
366
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
264
+0.11
0.01
366
+0.11
0.01
76
+0.11
0.01
Negative Logits
abbit
-1.74
?"
-1.66
etc
-1.62
/**<
-1.60
anni
-1.60
aned
-1.51
anus
-1.48
↵
-1.47
aten
-1.44
Figure
-1.43
POSITIVE LOGITS
submissions
1.79
cco
1.79
victories
1.58
xim
1.58
uchy
1.54
win
1.53
bie
1.52
book
1.46
otherapy
1.43
punches
1.42
Activations Density 0.122%