INDEX
Explanations
variations of the word "Bastard."
New Auto-Interp
Negative Logits
анÑĤаж
-0.17
alet
-0.17
æ°¸ä¹ħ
-0.16
دÛĮ
-0.16
aal
-0.16
itaire
-0.15
aft
-0.15
rý
-0.14
ract
-0.14
crets
-0.14
POSITIVE LOGITS
ardu
0.21
eful
0.21
ürk
0.21
rop
0.20
urma
0.20
efully
0.19
ard
0.19
ech
0.19
ogne
0.19
ards
0.19
Activations Density 0.014%