INDEX
Explanations
mentions of music-related words, such as "album," "song," "concerts," and "rock."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.15
0.8%
1385
+0.06
0.4%
1870
+0.04
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1385
+0.15
0.27
1870
+0.06
0.04
491
+0.04
0.11
Negative Logits
<bos>
-2.83
HasAnnotation
-0.84
ⓧ
-0.84
/***
-0.77
//{
-0.76
ുറ
-0.75
of
-0.73
#![
-0.71
/**
-0.70
springfox
-0.69
POSITIVE LOGITS
bandung
1.73
affor
1.68
véhic
1.66
haup
1.62
maneu
1.61
impra
1.57
napoli
1.57
increa
1.57
maroc
1.56
Intere
1.55
Activations Density 4.585%