INDEX
Explanations
references to protests and political grievances
Follows words indicating negativity or opposition
negative framing or specific pairings
New Auto-Interp
Negative Logits
YMMV
-0.55
contributors
-0.51
MessageTagHelper
-0.49
$.}
-0.48
addContainerGap
-0.48
jspb
-0.48
ześnie
-0.47
脚注の使い方
-0.46
LabelTagHelper
-0.46
$__
-0.46
POSITIVE LOGITS
shameless
0.64
blackmail
0.62
hatched
0.62
transfieras
0.61
fooling
0.61
spoiling
0.59
biased
0.58
oredCriteria
0.58
PreferredItem
0.57
illogical
0.56
Activations Density 0.160%