INDEX
Explanations
words related to media, attention, or promotion
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2011
+0.09
0.3%
938
+0.07
0.2%
1437
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1375
+0.09
0.02
1137
+0.07
0.03
1477
+0.07
0.03
Negative Logits
<bos>
-1.53
springfox
-0.70
<?
-0.65
<tfoot>
-0.65
ⓧ
-0.64
displayquote
-0.64
mergeFrom
-0.64
execSQL
-0.62
do
-0.61
맷
-0.60
POSITIVE LOGITS
accla
1.76
affor
1.74
ftu
1.73
stockholm
1.72
impra
1.71
increa
1.69
fta
1.68
strick
1.61
thut
1.58
Juf
1.56
Activations Density 0.115%