INDEX
Explanations
Offensive language
colloquial, informal or slangy language (including expletives) used in conversational tone.
New Auto-Interp
Negative Logits
requires
-0.07
isAdmin
-0.07
فارس
-0.07
neden
-0.07
kod
-0.07
Indeed
-0.06
organizations
-0.06
FC
-0.06
öt
-0.06
γκο
-0.06
POSITIVE LOGITS
shit
0.07
_stuff
0.07
stuff
0.07
cops
0.07
距离
0.06
.btnClose
0.06
bananas
0.06
(predicate
0.06
eview
0.06
maliyet
0.06
Activations Density 0.317%