INDEX
Explanations
phrases indicating mathematical relationships involving factors
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
407
+0.12
0.7%
53
+0.12
0.6%
306
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
30
+0.12
0.02
306
+0.12
0.02
407
+0.11
0.02
Negative Logits
ogen
-1.53
okinetic
-1.50
ometric
-1.40
aments
-1.40
ocracy
-1.40
ibil
-1.38
negatively
-1.37
increasingly
-1.34
ational
-1.33
hurt
-1.31
POSITIVE LOGITS
ĨĴ
2.19
<|outofrange|>
2.02
↵
2.02
↵
2.02
↵
2.02
<|outofrange|>
2.02
↵↵
2.02
<|outofrange|>
2.02
2.02
↵ ³³³
2.02
Activations Density 0.058%