INDEX
Explanations
mentions of benefits or advantages
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1310
+0.08
0.2%
1565
+0.08
0.2%
1174
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1310
+0.08
0.02
1565
+0.08
0.02
1941
+0.08
0.01
Negative Logits
implacable
-0.52
Aula
-0.47
lamentable
-0.46
incompet
-0.46
evoc
-0.45
Juana
-0.45
meras
-0.45
AGRIC
-0.44
avowed
-0.43
cristina
-0.42
POSITIVE LOGITS
Benefit
0.82
Benefit
0.80
benefit
0.78
benefit
0.75
BENE
0.71
benefited
0.65
benefits
0.64
Benefits
0.64
Benef
0.63
Benef
0.63
Activations Density 0.066%