INDEX
Explanations
words and phrases related to subversion and undermining authority or norms
New Auto-Interp
Negative Logits
ions
-0.18
leon
-0.17
วà¸Ļ
-0.15
enberg
-0.15
dr
-0.15
ario
-0.15
Kot
-0.14
eagle
-0.14
.extensions
-0.14
lick
-0.14
POSITIVE LOGITS
ersive
0.21
sub
0.21
=sub
0.20
[sub
0.20
verted
0.19
jug
0.19
standard
0.19
rosa
0.19
terr
0.19
(sub
0.18
Activations Density 0.018%