INDEX
Explanations
terms related to law enforcement and legal actions
actions related to implementing bans and restrictions
New Auto-Interp
Negative Logits
ebus
-0.81
Sov
-0.72
rious
-0.70
rouse
-0.66
united
-0.66
Dynamics
-0.65
gow
-0.65
swick
-0.65
lycer
-0.64
orthy
-0.62
POSITIVE LOGITS
altogether
1.21
unnecessary
0.98
offending
0.94
entirely
0.94
outright
0.90
abruptly
0.89
unwanted
0.88
bothering
0.86
disbelief
0.82
inhib
0.81
Activations Density 4.360%