INDEX
Explanations
expressions of optimism or future expectations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.24
1.4%
23
+0.16
0.9%
156
+0.15
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
257
+0.24
0.02
74
+0.16
0.02
250
+0.15
0.01
Negative Logits
alike
-1.97
§
-1.85
ĥ½
-1.76
PI
-1.70
erman
-1.68
)\].
-1.62
ERN
-1.56
ser
-1.54
atz
-1.51
PF
-1.50
POSITIVE LOGITS
DAMAGE
1.53
Mn
1.49
Britain
1.42
yel
1.38
Void
1.35
balance
1.34
ghan
1.33
Unicode
1.31
bold
1.29
result
1.27
Activations Density 0.014%