INDEX
Explanations
instances of derogatory language and criticisms directed towards individuals, particularly in the context of discourse about others
New Auto-Interp
Negative Logits
IUrlHelper
-0.60
kasarigan
-0.58
adaptiveStyles
-0.56
fromnode
-0.54
matchCondition
-0.52
WebElementEntity
-0.48
醐
-0.47
mania
-0.46
躇
-0.46
MainAxisSize
-0.45
POSITIVE LOGITS
insulting
0.61
criticisms
0.59
insults
0.57
derogatory
0.56
insult
0.55
criticism
0.54
accusations
0.51
dispar
0.51
mocking
0.51
disrespectful
0.50
Activations Density 0.345%