INDEX
Explanations
words related to sound and audio
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.11
0.5%
1296
+0.07
0.3%
1671
+0.06
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
376
+0.11
0.05
258
+0.07
0.04
1372
+0.06
0.04
Negative Logits
<bos>
-1.96
ⓧ
-0.86
public
-0.79
.
-0.74
displayquote
-0.74
,
-0.74
util
-0.70
continue
-0.70
HasIndex
-0.70
itemize
-0.70
POSITIVE LOGITS
maneu
2.02
accla
1.97
affor
1.97
ftu
1.94
increa
1.91
hcm
1.91
stockholm
1.89
emphat
1.89
impra
1.88
fta
1.87
Activations Density 0.094%