INDEX
Explanations
the presence of the word "odd."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.23
1.3%
198
+0.14
0.8%
351
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
360
+0.23
0.01
275
+0.14
0.00
23
+0.12
0.02
Negative Logits
:`
-1.84
$/
-1.70
ÃŃa
-1.70
/-
-1.67
~:
-1.53
/**
-1.51
Telescope
-1.50
iom
-1.47
itat
-1.47
#{$-1.46
POSITIVE LOGITS
Ł
2.22
±
2.19
ÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤ
2.12
č↵
2.10
↵↵
2.10
2.10
2.10
↵
2.10
2.10
↵
2.10
Activations Density 0.313%