INDEX
Negative Logits
oeff
-0.07
acomp
-0.07
uese
-0.06
annoying
-0.06
urface
-0.06
かけ
-0.06
consum
-0.06
會
-0.06
bigot
-0.06
perm
-0.06
POSITIVE LOGITS
ư
0.07
登录
0.07
knives
0.06
lius
0.06
erotik
0.06
("/",0.06
witness
0.06
_bridge
0.06
<Document
0.06
-auth
0.06
Activations Density 0.023%