INDEX
Explanations
phrases indicating specific conditions or characteristics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.18
1.0%
125
+0.12
0.7%
369
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
454
+0.18
0.04
125
+0.12
0.04
244
+0.11
0.06
Negative Logits
table
-1.57
feas
-1.52
tables
-1.49
ible
-1.43
...\...\
-1.40
ersion
-1.39
irector
-1.37
icio
-1.37
не
-1.37
orld
-1.34
POSITIVE LOGITS
Ī
2.96
↵
2.88
↵
2.88
2.88
↵
2.88
↵
2.88
č↵
2.88
2.88
↵↵
2.88
č↵
2.88
Activations Density 0.494%