INDEX
Explanations
references to music albums and songs
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1978
+0.13
0.4%
690
+0.12
0.4%
764
+0.12
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1177
+0.13
0.02
981
+0.12
0.05
1902
+0.12
0.03
Negative Logits
<bos>
-0.79
Paglinawan
-0.77
хьтан
-0.68
ujednoznacz
-0.67
Aholisi
-0.67
تضيفلها
-0.66
Italijani
-0.66
Personensuche
-0.65
Jeografia
-0.65
nawr
-0.62
POSITIVE LOGITS
shenan
1.59
maneu
1.48
apprehen
1.47
depic
1.43
pamph
1.42
accla
1.42
reluct
1.38
milf
1.37
cuck
1.35
sophistic
1.34
Activations Density 0.246%