INDEX
Explanations
instances of physical violence or force
expressions related to physical acts of beating or striking
New Auto-Interp
Negative Logits
orrow
-0.71
sol
-0.64
Newsletter
-0.64
iscal
-0.64
alg
-0.63
export
-0.63
berra
-0.63
osion
-0.62
omal
-0.61
responsible
-0.61
POSITIVE LOGITS
beating
1.08
beat
0.98
beat
0.93
beaten
0.93
Beat
0.84
beats
0.84
Beat
0.80
enance
0.79
Rouse
0.77
whipping
0.77
Activations Density 0.009%