INDEX
Explanations
phrases beginning with 'Well' for additional detail or a change in content
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1265
+0.14
0.5%
381
+0.14
0.4%
872
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
58
+0.14
0.03
1265
+0.14
0.03
484
+0.11
0.02
Negative Logits
Hahahahaha
-0.85
Lma
-0.78
Noice
-0.77
Lmfao
-0.74
sappi
-0.73
lancia
-0.67
Ehh
-0.67
Hahah
-0.66
vogli
-0.66
Ikr
-0.66
POSITIVE LOGITS
Well
0.97
Well
0.93
WELL
0.88
well
0.86
well
0.85
wells
0.81
WELL
0.77
Wells
0.77
wells
0.66
Wells
0.64
Activations Density 0.036%