INDEX
Explanations
insulting or derogatory terms to describe people
harsh insults and derogatory language
New Auto-Interp
Negative Logits
imester
-0.87
AMS
-0.80
ItemImage
-0.78
istry
-0.77
conom
-0.75
isine
-0.73
eton
-0.69
ISTORY
-0.69
anship
-0.68
readiness
-0.67
POSITIVE LOGITS
bunny
1.26
bastard
1.25
puppy
1.22
guy
1.21
dude
1.20
monkey
1.19
bitch
1.17
gorilla
1.17
ape
1.17
jerk
1.17
Activations Density 0.664%