INDEX
Explanations
specific names or identifiers related to brands or products
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
457
+0.18
1.0%
410
+0.16
0.9%
111
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
410
+0.18
0.01
457
+0.16
0.01
111
+0.12
0.01
Negative Logits
"?
-1.78
?",
-1.64
inois
-1.57
enburg
-1.57
amento
-1.54
'?
-1.54
↵
-1.53
osing
-1.49
↵
-1.49
lect
-1.47
POSITIVE LOGITS
budgets
1.62
nee
1.50
eyed
1.49
ifi
1.48
ript
1.45
bit
1.42
BER
1.42
iah
1.41
1.41
Stat
1.39
Activations Density 0.010%