INDEX
Explanations
negative terms or insults directed towards individuals
derogatory labels or insults directed at individuals
New Auto-Interp
Negative Logits
foreseen
-0.65
eto
-0.64
ubuntu
-0.62
strip
-0.61
olar
-0.61
ispers
-0.61
iasm
-0.61
uce
-0.59
lured
-0.59
Remastered
-0.59
POSITIVE LOGITS
SourceFile
0.71
Tes
0.70
"
0.70
necessity
0.66
bluff
0.65
'
0.64
Jem
0.63
nuisance
0.61
="#
0.61
ãĢİ
0.60
Activations Density 0.129%