INDEX
Explanations
phrases related to being offended or causing offense
references to being offended or causing offense
New Auto-Interp
Negative Logits
issue
-0.74
liner
-0.72
runner
-0.71
tom
-0.69
aver
-0.69
negie
-0.68
hart
-0.66
nosis
-0.66
ynthesis
-0.65
gravity
-0.65
POSITIVE LOGITS
offended
0.97
offend
0.94
insulted
0.84
indecent
0.80
Sax
0.75
sexist
0.74
Yiannopoulos
0.73
ingly
0.72
bystanders
0.72
offending
0.71
Activations Density 0.018%