INDEX
Explanations
references to moderation in a community or platform context
following Mod
New Auto-Interp
Negative Logits
uxxxx
-0.33
쁘
-0.31
mentare
-0.30
ittu
-0.29
Caja
-0.29
currently
-0.28
Czer
-0.27
രിക്ക
-0.27
Iyer
-0.27
äume
-0.27
POSITIVE LOGITS
Mod
2.59
Mod
2.42
Mods
1.70
mods
1.65
Mods
1.62
MOD
1.61
mod
1.55
mod
1.55
MOD
1.53
mods
1.44
Activations Density 0.003%