INDEX
Explanations
religious references or terms associated with crucifixion
New Auto-Interp
Negative Logits
annes
-0.17
ookie
-0.16
ook
-0.16
preneur
-0.16
erva
-0.16
bip
-0.15
enc
-0.15
обл
-0.15
iland
-0.14
erte
-0.14
POSITIVE LOGITS
edes
0.19
-Compatible
0.15
cul
0.15
Canonical
0.14
adora
0.14
ัมà¸ŀ
0.14
Canonical
0.14
CommandLine
0.14
allon
0.14
_DST
0.14
Activations Density 0.004%