INDEX
Explanations
words related to physical attacks or aggressive behavior
references to various forms of assault
New Auto-Interp
Negative Logits
ãĤ©
-0.71
snipp
-0.71
ocular
-0.68
FORE
-0.66
overed
-0.66
milo
-0.65
Solitaire
-0.65
CFR
-0.65
ETA
-0.63
darn
-0.63
POSITIVE LOGITS
assault
0.84
uous
0.84
ments
0.82
ive
0.80
ively
0.79
quez
0.79
perpetrated
0.77
iveness
0.77
entimes
0.75
assaults
0.74
Activations Density 0.021%