INDEX
Explanations
phrases related to hiding or concealing something controversial or negative
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1222
+0.13
0.6%
892
+0.12
0.6%
50
+0.11
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1805
+0.13
0.02
690
+0.12
0.02
597
+0.11
0.02
Negative Logits
<bos>
-2.62
lateinit
-0.79
},{
-0.74
ComponentModel
-0.74
/***
-0.74
HasIndex
-0.70
dst
-0.69
src
-0.68
glTexCoord
-0.67
nmgp
-0.65
POSITIVE LOGITS
stockholm
1.59
emphat
1.59
!...
1.55
?...
1.54
vhs
1.54
milf
1.53
indestru
1.52
affor
1.50
hentai
1.48
increa
1.47
Activations Density 0.253%