INDEX
Explanations
expressions related to offensiveness and inappropriate language
New Auto-Interp
Negative Logits
__':
-0.84
saraba
-0.77
parsedMessage
-0.76
EconPapers
-0.73
WebDriverWait
-0.72
تضيفلها
-0.68
]")]
-0.68
tagHelperRunner
-0.66
Савезне
-0.65
Искәрмәләр
-0.65
POSITIVE LOGITS
offended
1.60
offend
1.47
offense
1.38
offence
1.37
offending
1.35
offensive
1.23
Offense
1.14
offensive
1.13
Offensive
1.07
affront
1.07
Activations Density 0.426%