INDEX
Explanations
words related to bodily waste
references to animal waste and excrement
New Auto-Interp
Negative Logits
nee
-0.71
listed
-0.69
NESS
-0.66
Azerbai
-0.66
eer
-0.64
aido
-0.64
sighted
-0.64
addons
-0.62
boycot
-0.62
broad
-0.62
POSITIVE LOGITS
poop
1.04
feces
0.81
flush
0.77
ÅĤ
0.73
microbiota
0.72
spawn
0.71
opy
0.70
apons
0.68
Yoshi
0.67
alogy
0.67
Activations Density 0.008%