INDEX
Explanations
phrases related to simplicity and plans
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
680
+0.13
0.5%
1363
+0.13
0.5%
1964
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
680
+0.13
0.03
1793
+0.13
0.03
1964
+0.13
0.03
Negative Logits
depic
-1.04
intersper
-1.00
shenan
-0.99
disagre
-0.97
inappro
-0.95
gild
-0.93
accla
-0.92
encomp
-0.92
apprehen
-0.90
Shakspeare
-0.89
POSITIVE LOGITS
simple
1.17
simple
1.12
Simple
1.12
Simple
1.10
simples
1.02
SIMPLE
0.98
SIMPLE
0.93
simplest
0.84
simplicity
0.82
simpler
0.82
Activations Density 0.076%