INDEX
Explanations
phrases related to internet user behavior and online community management
references to spam and platform moderation policies
New Auto-Interp
Negative Logits
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
-0.92
Clarke
-0.80
Jennings
-0.77
Bundy
-0.76
Desmond
-0.76
ensemble
-0.74
Quart
-0.73
Calder
-0.71
Roland
-0.69
Bauer
-0.69
POSITIVE LOGITS
spam
1.85
scams
1.34
abusive
1.25
scam
1.25
bots
1.24
harassing
1.24
slander
1.23
malicious
1.20
imperson
1.17
annoying
1.17
Activations Density 0.464%