INDEX
Explanations
words related to online forum moderation and rules enforcement
guidelines and rules for posting in an online thread
New Auto-Interp
Negative Logits
ichick
-0.77
fitted
-0.76
Romo
-0.71
Paraly
-0.70
isin
-0.69
equipped
-0.68
jets
-0.67
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
-0.67
arming
-0.66
Siem
-0.65
POSITIVE LOGITS
1.20
reddits
1.18
1.13
subreddits
1.11
moderation
1.07
1.07
moderators
1.07
1.04
Forums
1.04
subreddit
1.04
Activations Density 0.272%