INDEX
Explanations
names that end with 'ky'
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1491
+0.08
0.2%
1350
+0.07
0.2%
397
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.08
0.03
85
+0.07
0.02
99
+0.07
0.02
Negative Logits
remain
-0.64
toMatch
-0.64
later
-0.62
<tfoot>
-0.62
unit
-0.62
站
-0.61
PRWEB
-0.61
we
-0.60
draw
-0.60
}{||-0.59
POSITIVE LOGITS
ky
2.66
wherea
1.79
accla
1.75
emphat
1.75
KY
1.74
secon
1.73
volunte
1.67
depic
1.66
Intere
1.66
squa
1.65
Activations Density 0.163%