INDEX
Explanations
requests to listen to a podcast
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.27
1.3%
1059
+0.09
0.4%
101
+0.09
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
791
+0.27
0.03
1356
+0.09
0.03
459
+0.09
0.03
Negative Logits
<bos>
-3.42
ⓧ
-0.94
<?
-0.80
/*
-0.79
/**
-0.74
Географи
-0.73
-0.73
<eos>
-0.70
util
-0.67
AddColumn
-0.67
POSITIVE LOGITS
maneu
2.21
increa
2.19
affor
2.14
reluct
2.12
volunte
2.09
guarante
2.08
impra
2.06
erad
2.04
accla
2.04
inev
2.04
Activations Density 0.134%