INDEX
Explanations
phrases indicating excessive quantities or conditions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
256
+0.13
0.7%
373
+0.12
0.7%
178
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
256
+0.13
0.04
206
+0.12
0.03
166
+0.11
0.02
Negative Logits
atrix
-1.71
ariat
-1.63
ientos
-1.53
illery
-1.50
apter
-1.48
ractor
-1.47
antibodies
-1.46
ient
-1.46
NET
-1.41
AY
-1.41
POSITIVE LOGITS
arching
2.13
grown
2.03
came
1.87
produced
1.84
blown
1.79
heard
1.76
loaded
1.74
reaching
1.71
writ
1.71
coming
1.71
Activations Density 0.130%