INDEX
Explanations
phrases or expressions of indifference or flexibility
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
13
+0.15
0.8%
356
+0.14
0.7%
246
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
13
+0.15
0.03
356
+0.14
0.03
246
+0.12
0.03
Negative Logits
ught
-1.70
ounder
-1.64
ffer
-1.60
elic
-1.53
hof
-1.49
ented
-1.45
oca
-1.41
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-1.36
resis
-1.36
Examin
-1.35
POSITIVE LOGITS
else
2.16
ively
1.60
ifice
1.58
innings
1.57
manship
1.48
msgid
1.46
ELSE
1.45
happened
1.44
GENERATED
1.42
Else
1.39
Activations Density 0.144%