INDEX
Explanations
references to historical or cultural enslavement and personal identity
New Auto-Interp
Negative Logits
кид
-0.15
angel
-0.14
vacc
-0.14
ết
-0.14
Angel
-0.14
oram
-0.14
.sym
-0.14
vrier
-0.13
picture
-0.13
791
-0.13
POSITIVE LOGITS
\Php
0.19
ยà¸ĩ
0.17
slave
0.17
arkan
0.16
slave
0.15
asp
0.15
slaves
0.14
имÑĥ
0.14
ãĢĤãĢĤ↵↵
0.14
ukan
0.14
Activations Density 0.002%