INDEX
Explanations
objects or items that can be contained or stored in a box
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.09
0.2%
1842
+0.08
0.2%
605
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1830
+0.09
0.03
118
+0.08
0.04
1167
+0.08
0.02
Negative Logits
suspic
-0.91
Haci
-0.91
Souha
-0.88
Keny
-0.84
utaf
-0.81
Juf
-0.80
philanth
-0.79
quoc
-0.78
maksi
-0.78
sophistic
-0.78
POSITIVE LOGITS
all
0.96
all
0.86
everything
0.85
allemaal
0.78
alles
0.75
everything
0.74
tất
0.72
tudo
0.71
anything
0.70
etc
0.70
Activations Density 0.424%