INDEX
Explanations
words related to things that are considered inappropriate, shocking, or upsetting
instances of the word "offensive" in various contexts
New Auto-Interp
Negative Logits
chell
-0.96
Deal
-0.77
aret
-0.77
omed
-0.72
ho
-0.71
omething
-0.71
perature
-0.71
Cind
-0.67
bourg
-0.66
luck
-0.66
POSITIVE LOGITS
offensive
0.77
ity
0.75
thrust
0.72
posture
0.71
thouse
0.70
linemen
0.67
ities
0.67
against
0.64
guessing
0.64
contraception
0.63
Activations Density 0.023%