INDEX
Explanations
phrases related to physical violence or aggressive behavior
mentions of the word "beat."
New Auto-Interp
Negative Logits
asel
-0.68
ateral
-0.67
condem
-0.66
orrow
-0.65
Import
-0.64
pard
-0.62
isse
-0.61
BuyableInstoreAndOnline
-0.61
OPLE
-0.61
amera
-0.60
POSITIVE LOGITS
rice
1.14
beat
1.08
down
1.07
boxing
0.99
downs
0.97
nik
0.94
tle
0.93
ework
0.91
ings
0.89
hered
0.87
Activations Density 0.045%