INDEX
Explanations
derogatory terms or insults directed at individuals
derogatory labels or insults used in social or political contexts
New Auto-Interp
Negative Logits
foreseen
-0.73
adjoining
-0.73
conclud
-0.70
inclined
-0.68
effected
-0.64
governed
-0.63
amplified
-0.63
contexts
-0.62
resided
-0.62
adjacent
-0.60
POSITIVE LOGITS
liar
0.85
"
0.79
'
0.76
traitor
0.76
fascist
0.75
junk
0.75
Tes
0.74
nuisance
0.74
coward
0.72
ãĢİ
0.72
Activations Density 0.123%