INDEX
Explanations
references to cultural standards and perceptions of beauty
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
0.7%
1978
+0.11
0.4%
378
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
581
+0.18
0.09
378
+0.11
0.09
1055
+0.10
0.07
Negative Logits
<bos>
-2.18
intersper
-0.61
enshr
-0.57
amass
-0.54
//---
-0.53
/***
-0.53
condense
-0.52
tentatively
-0.52
defray
-0.51
harmonize
-0.51
POSITIVE LOGITS
anymore
0.99
signora
0.92
bandung
0.91
quoique
0.82
nor
0.81
postolic
0.81
tristes
0.80
warung
0.79
jawa
0.78
quarelle
0.77
Activations Density 1.232%