INDEX
Explanations
words related to music and entertainment, such as song titles, band names, and record labels
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.18
0.5%
1839
+0.08
0.3%
1535
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1870
+0.18
0.05
553
+0.08
0.06
1839
+0.08
0.06
Negative Logits
GEBURTSDATUM
-0.72
noyau
-0.71
refroidissement
-0.69
récompense
-0.60
GIVEREF
-0.59
makeConstraints
-0.59
useNavigate
-0.57
bootstrapcdn
-0.56
clôture
-0.56
loginUser
-0.56
POSITIVE LOGITS
portait
0.80
ouvre
0.74
“…”
0.74
Avez
0.72
Quoi
0.72
Personne
0.71
Xoxo
0.70
Voyez
0.70
Autre
0.70
Cringe
0.69
Activations Density 0.742%