INDEX
Explanations
words related to insults and derogatory language
references to insults and derogatory remarks
New Auto-Interp
Negative Logits
negie
-0.73
ccording
-0.72
ullivan
-0.70
ucket
-0.68
illon
-0.68
ilver
-0.68
ail
-0.67
arten
-0.66
angler
-0.65
ooth
-0.65
POSITIVE LOGITS
ingly
1.02
insult
0.99
ãĤ¹ãĥĪ
0.89
insulted
0.86
insulting
0.83
insults
0.83
prejudice
0.77
disrespect
0.76
ridicule
0.75
hurled
0.74
Activations Density 0.022%