INDEX
Explanations
expressions of disappointment
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1265
+0.10
0.3%
517
+0.08
0.2%
27
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.10
0.04
690
+0.08
0.04
1343
+0.07
0.04
Negative Logits
encomp
-1.72
ftre
-1.60
ftu
-1.55
apprehen
-1.50
tranf
-1.50
fuf
-1.48
vns
-1.47
guarante
-1.45
purcha
-1.44
fays
-1.43
POSITIVE LOGITS
Alas
1.19
Alas
1.18
minus
1.09
mighty
1.07
minus
0.98
Mighty
0.92
Mighty
0.87
tachment
0.84
Personendaten
0.82
mighty
0.81
Activations Density 0.354%